m-Mf?  33t  DISTRIBUTED  COHPUTINQ  FOR  SIGNM.  PROCESSIM:  1/1 

TOPOLOGICAL  PROPERTIES  OF  IN. .  <U>  PURDUE  UHIV  LAFAVETTE 
IN  R  R  SEBAN  DEC  8S  AR0-18790. 17-EL-APP-E 
UNCLASSIFIED  DARG29-82-K-S181  F/8  9/2  NL 


AD-A167  336 


TOPOLOGICAL  PROPERTIES  OF 
INTERCONNECTION  NETWORKS 
FOR  PARALLEL  PROCESSORS 


Ph.D.  Thesis  by: 

Robert  R.  Seban 

Faculty  Advisor: 

Howard  Jay  Siegel 


Appendix  E  for 

Distributed  Computing  for 
Signal  Processing: 
Modeling  of  Asynchronous 
Parallel  Computation 


Final  Report 


U.S.  Army  Research  Office 
Contract  No.  DAAG29-82-K-0101* 


OTIC 

ELECTE 

APRS  01386 


'Chapters  1  through  8  supported  by  this  contract. 


Thto  d>e«muirt  bon  ■  at>pnrmf 
for  poblte  ratuoM  end  nalat  Mi 
dtelrfbutlaa  In  anHwIfA 


4  28  1 7 


:k 


I 


I 


TOPOLOGICAL  PROPERTIES  OF  INTERCONNECTION 
NETWORKS  FOR  PARALLEL  PROCESSORS 
-  A  UNIFIED  APPROACH 

A  Thesis 

Submitted  to  the  Faculty 

of 

Purdue  University 

by 

Robert  R.  Seban 

In  Partial  Fulfillment  of  the 
Requirements  for  the  Degree 

of 

Doctor  of  Philosophy 

December  1985 


Thu  AaeammcA  hSi  bew  ei|if3COvW  I 
for  pobHo  r«d*c3»**  and  aa!*;  Hi 

ciUttlburto^  Is  ni'Jia-V.ed. 


>yv; 

0 


‘.v.\ 

tj:; 


if 


P 

►.V. 


r-:-S 

>-*y 

m 


:y; 

m 

*c< 

fij 

ft' 

V  V, 

V  *J1 


ACKNOWLEDGMENTS 


I  would  like  to  thank  my  major  advisor  Professor  H.  J.  Siegel  and  my 
committee  members  Professor  E.  J.  Coyle,  Professor  S.  E.  Hambrusch,  and  Pro¬ 
fessor  D.  G.  Meyer  for  their  time  and  effort. 

Chapters  4  and  5  were  supported  in  parts  by  the  U.S.  Army  Research 
Office,  Department  of  the  Army,  under  contract  number  DAAG29-82-K-0101. 
Chapters  6,  7,  and  8  were  supported  in  parts  by  a  David  Ross  Grant  1084- 
1085,  under  contract  number  0857-58-12855,  and  by  the  U.S.  Army  Research 
Office,  Department  of  the  Army,  under  contract  number  DAAG20-82-K-0101. 
Chapter  0  was  supported  by  IBM  Federal  Systems  Division,  under  contract 
number  280662B-YD.  Chapter  10  was  supported  in  parts  by  National  Science 
Foundation  Grant  under  contract  number  ECS-8120806. 


TABLE  OF  CONTENTS 


LIST  OF  TABLES . 

LIST  OF  FIGURES . 

ABSTRACT . 

1  INTRODUCTION . 

2  ORGANIZATION  OF  THE  THESIS . 

3  PARALLEL  COMPUTER  ARCHITECTURES.. 

3.1  Introduction . 

3.2  Overview . 

3.3  Problem  Statement . 

3.4  Parallel  Computer  Architecture  Classes . 

3.5  Conclusions . . . 

4  MODELING  OF  NETWORKS  AND  SYSTEMS. 

4.1  Introduction . 

4.2  Overview . 

4.3  Problem  Statement . 

4.4  Previous  Work . 

4.5  Basic  Concepts . 

4.6  Interconnection  Network  Model . . . 

4.7  Systems  and  Subsystems . 

4.8  Conclusions . 

5  QUASIMORPHISM  AND  EMULATION . 

5.1  Introduction . 


5.2  Overview . 

5.3  Problem  Statement . 

5.4  Previous  Work . 

5.5  Basic  Concepts . 

5.6  Quasimorphism . 

5.7  Emulation  of  Systems . 

5.8  Conclusions . 

6  SINGLE  STAGE  NETWORKS  -  ANALYSIS . 

6.1  Introduction . 

6.2  Overview . 

6.3  Problem  Statement . 

6.4  Previous  Work . 

6.5  Basic  Concepts . 

6.6  Composition  and  Decomposition  of  Networks . 

6.7  Partitionability  Algorithm . 

6.8  Conclusions . 

7  SINGLE  STAGE  PARTITIONABLE  NETWORKS  -  SYNTHESIS 

7.1  Introduction . 

7.2  Overview . 

7.3  Problem  Statement . 

7.4  Previous  Work . '. . 

7.5  Basic  Concepts . 

7.6  Synthesis  of  Single  Stage  Partitionable  Networks . 

7.7  Conclusions . 

8  MULTISTAGE  NETWORKS  -  ANALYSIS . 

8.1  Introduction . . 

8.2  Overview . 

8.3  Problem  Statement . 

8.4  Previous  Work . 

8.5  Basic  Concepts . . 

8.6  Multistage  Network  Model  and  Applications . 

8.7  Conclusions . . . 

9  DATA  COMMUNICATION  IN  A  REAL-TIME  SYSTEM . 

9.1  Introduction . 

9.2  Overview . 


9.3  Problem  Definition . 183 

9.4  Basic  Concepts . 186 

9.5  DMA  -  Direct  Memory  Access . 187 

9.6  Architecture  of  the  Fault  Tolerant  Crossbar . 193 

9.7  Network  Architectures . 203 

9.8  Fault  Detection  and  Recovery . 211 

9.9  Conclusions . 215 


10  SHUFFLING  WITH  THE  ILLIAC  AND  PM2I  SIMD  NETWORKS . ...216 


10.1  Introduction . 217 

10.2  Overview . 218 

10.3  SIMD  Machines . 218 

10.4  The  Interconnection  Networks . 222 

10.5  Shuffling  with  the  PM2I  Network . 225 

10.6  Shuffling  with  the  Illiac  Network . 239 

10.7  Conclusions . 241 


11  CONCLUSIONS 


242 


LIST  OF  REFERENCES 


251 


VI 


LIST  OF  TABLES 


Table  Page 

5.1;  ComparisoD  of  efficiency 

of  different  types  of  emulation . 103 

7.1:  The  multiplication  table 

for  <C,  7>,  Example  7.6.10 . 148 

7.2:  The  multiplication  table 

for  <C,  7>,  Example  7.6.11 . 150 

10.1:  The  idea  underlying  the 

algorithm  for  the  PM2I  to  perform 

the  shuffle,  shown  for  N  =  8 . 229 


10.2:  Example  of  the  algorithm  for  performing  the 
shuffle  using  the  PM2I  when  N  =  8. 

It  is  assumed  that  initially  the  DTK  of  PE  P 
contains  the  integer  P,  0  <  P  <  8. 

The  dotted  line  shows  the  movement  of  the  data 

originally  in  the  DTK  of  PE  5  (=  101) . 232 


I 

vii 


LIST  OF  FIGURES 


Figure  Page 

4.1:  An  example  of  a 

topologically  arbitrary  network . 33 

4.2:  Example  of  a  system . 44 

4.3:  Example  of  a  fully  recirculating  system . 53 

4.4:  Example  of  a  partially  recirculating  system . 55 

4.5:  Example  of  a  nonrecirculating  system . 56 

5.1:  Genealogy  of  the  maps  and  correspondences . 67 

5.2:  Systems  S‘  =  <CS  Q>,  Si  =  <d,  Cf> 

for  Example  5.6.12 . 91 

5.3:  Systems  S‘  =  <C\  Q>,  Si  =  <Cj,  C^> 

for  Example  5.6.13 . 92 

5.4:  Systems  S^  =  <QK  Cf  >,  S*'  =  <C'‘, 

for  Example  5.7.3 . 98 

5.5;  Systems  ^  =  <0,  C^>,  S*‘  =  <C^  C^> 

for  Example  5.7.4 . 100 

8.1:  Multistage  network  for  Example  8.6.2 . 166 

8.2:  Multistage  network  for  Example  8.6.3 . 167 

8.3:  Multistage  network  for  Example  8.6.4 . 169 

Multistage  network  for  Example  8.6.5. 


8.4: 


170 


* 


r 


fflp 


fWlliW  ."fy  V'  v<!.g  fyjwj  "Jt  »■>  Fji?™  ■»  -«  ~M 


VUl 


Figure  Page 

9.1:  Signal  data  transfer  parameters  for 

three  iterations  of  an  evolutionary  dbtributed 

processing  system . 175 

9.2:  A  signal  data  switching  configuration 

for  front  end  PEs . 178 

9.3:  PEs  and  their  associated  swinging 

buffered  memories . 180 

9.4:  The  architecture  of  the  communication  system: 

D  -  data,  C  -  control,  R  -  report, 

AS  -  PE  source  address,  AD  -  PE  destination  address . 188 

9.5:  Source  DMA  architecture . 189 

9.6:  Destination  DMA  architecture . 192 

9.7:  Implementation  of  a  4x4x8 

crossbar  using  bit  slicing . 194 

9.8:  Implementation  of  a  4x4x8 

crossbar  using  cascading . 195 

9.9:  Block  diagram  of  a  type  I  chip . 197 

9.10;  The  data  path  for  output  port  i  (DO  i) . 200 

9.11:  Block  diagram  of  a  type  0  chip . 202 

9.12:  Network  architecture  scheme  1 . 205 

9.13:  Network  architecture  scheme  2 . 207 

9.14:  Network  architecture  scheme  3 . 209 

10.1:  PE-to-PE  SIMD  machine  configuration,  with  N  PEs . 220 

10.2:  Illiac  network  for  N  =  16. 

(The  actual  Illiac  IV  SIMD  mac^e  had  N  =  64). 

Vertical  lines  are  +  \/N  and  -  V 
Horizontal  lines  are  +1  and  -1. 


223 


10.3:  PM2I  network  for  N  =  8. 

(a)  PM2.I.0  connections,  (b)  PM2.t.i  connections. 

(c)  PM2+2  connections.  For  the  PM2_i  connections, 
0  <  i  <  2,  reverse  the  direction  of  the  arrows . 

10.4:  Shuffle-Exchange  network  for  N  =  8. 

Dashed  line  is  shuffle,  solid  line  b  exchange . 


ABSTRACT 


Seban,  Robert  R.  Ph.D.,  Purdue  University.  December  1985. 
TOPOLOGICAL  PROPERTIES  OF  INTERCONNECTION  NETWORKS 
FOR  PARALLEL  PROCESSORS  -  A  UNIFIED  APPROACH.  Major 
Professor:  Howard  Jay  Siegel. 

Two  methods  are  used  to  speed  up  the  execution  of  a  computational  task. 
One  is  new  technology  development  and  the  other  is  the  exploitation  of  paral¬ 
lelism  in  the  computation.  To  take  an  advantage  of  the  parallelism  in  a  task 
requires  the  utilization  of  parallel  computer  architectures.  At  a  certain  high 
level  of  abstraction  a  parallel  computer  system  is  represented  as  a  graph  where 
the  nodes  represent  processors,  memories,  or  other  devices,  and  the  edges 
represent  the  communication  links. 

In  this  research  the  following  problems  of  parallel  processing  are  studied. 
First  is  a  theoretical  study  of  topological  properties  of  interconnection  net¬ 
works.  Second  is  a  case  study  of  a  network  design  for  a  real-time  system. 
Lastly,  the  use  of  SIMD  networks  for  performing  “shuffles.” 

A  general  model  that  can  be  used  to  describe  networks  and  systems  with 
arbitrary  topologies  is  developed.  Based  upon  the  of  morphism  of  groups,  the 
concept  of  morphism  of  systems  is  developed.  The  morphism  of  systems  is 
called  quasimorphism  and  allows  a  method  of  comparison  between  topologically 
arbitrary  parallel  computer  systems.  The  quasimorphism  is  used  to  study  the 
emulation  of  one  system  by  another. 

The  composition,  decomposition,  and  partition  ability  of  single  stage  net¬ 
works  are  studied.  Informally,  the  partition  ability  property  means  that  the 


network  can  be  divided  into  several  parts  each  with  a  degree  of  independence. 
The  synthesis  of  single  stage  partitionable  interconnection  networks  is  exam¬ 
ined.  The  applications  of  the  model  to  multistage  networks  is  discussed. 

A  case  study  of  the  design  of  a  network  for  a  real-time  signal  processing 
system  is  performed.  A  network  and  network  interfaces  are  designed  for  a  dis¬ 
tributed  digital  signal  processing  system  subject  to  high  throughput,  extendibil- 
ity,  fault  tolerance,  and  other  constraints. 

The  data  permuting  ability  of  single  stage  SIMD  networks  are  studied. 
Specifically,  algorithms  for  the  PM21  and  Illiac  networks  to  perform  the 
“shuffle”  data  permutations  are  developed. 


2 


1 
! 

Two  methods  are  used  to  speed  up  the  execution  of  a  computational  task. 
One  b  new  technology  development  and  the  other  b  the  exploitation  of 
parallelbm  in  the  computation.  To  take  advantage  of  the  parallelbm  in  the 
task  requires  the  utilization  of  parallel  computer  architecture  [KuL78,  ThW75]. 
There  are  two  major  classes  of  parallel  computer  system  architectures,  loosely 
coupled,  where  the  information  transfer  b  infrequent,  and  tightly  coupled, 
where  information  transfer  is  frequent,  perhaps  every  operation  cycle.  In  this 
research  the  primary  concern  b  the  class  of  tightly  coupled  parallel  computer 
systems. 

At  a  certain  high  level  of  abstraction  a  parallel  computer  system  b 
represented  as  a  graph  where  the  nodes  represent  processors,  memories,  or 
other  devices  and  the  edges  represent  the  communication  links.  Thb 
representation  is  frequently  used  by  researchers  and  b  based  upon  the  belief 
that  one  of  the  salient  features  of  a  parallel  computer  system  is  the  topology  of 
the  interconnection  network  and  the  way  the  processors  and  other  devices  are 
connected  to  it.  Although  the  gn^aph  depiction  of  the  system  contains  large 
amount  of  information,  it  does  not  convey  the  dynamic  structure  of  a 
reconfigurable  network.  Our  model  developed  in  this  research  embodies  that 
information. 

Much  research  has  been  devoted  to  study  several  topologically  regular 
interconnection  networks.  Amongst  the  best  known  networks  are  Illiac 
(BoD72l,  Shuffle  [LaS76],  Omega  [Law75),  multistage  Cube  [AdS82b],  STARAN 
[Bat76],  ADM  [McS82j,  k-connected  mesh  [NaS80j,  and  PM2I  [SeS84bl.  The 
researcher  usually  proceeded  as  follows:  he  devised  a  model  for  the  network  of 
interest  and  derived  analytical  results  based  on  that  model.  Thb  approach  has 


MPJ  "JI'^JI'^  W^Ji  PJI  M>  nr.’W  V  ■#  nf*  -'V'-TT  nrj^iT 


the  drawback  that  the  results  are  network  specific  since  the  model  is  network 
specific  and  sometimes  implementation  dependent. 

Our  research  differs  from  the  past  work  in  several  aspects.  First,  a  unified 
approach  to  the  analysis  of  interconnection  networks  that  b  valid  for  large 
classes  of  interconnection  networks  was  developed.  Second,  several  algorithms 
that  allow  systematic  analysb  and  design  of  networks  with  the  desired  property 
of  partition  ability  will  be  developed.  In  more  detail,  the  following  related 
topics  of  topological  properties  of  parallel  computer  systems  will  be  studied. 

In  Chapter  3,  the  background  of  parallel  computer  architecture  b 
presented.  Numerous  parallel  computer  systems  have  been  dbcussed  in  the 
literature  and  proposed,  and  several  have  been  built.  Parallel  systems  are 
divided  into  two  major  classes,  tightly  coupled  and  loosely  coupled.  The 
subject  of  analysb  here  b  the  tightly  coupled  parallel  systems  group  which  can 
be  divided  into  several  categories. 

Iv  is  shown  that  each  type  of  parallel  computer  architecture  requires  one  or 
more  interconnection  networks.  Some  systems  use  networks  dedicated  to  the 
communication  between  particular  subsystems,  some  other  systems  use  a  single 
network  multiplexed  for  communication  among  different  parts  of  the  system. 
In  an  ensemble  parallel  system  the  network  b  used  by  the  control  unit  to 
broadcast  instructions  and  data  |TbW75].  In  a  pipelined  system  the 
interconnection  network  is  used  to  provide  data  communication  among  the 
computational  units  (segments)  of  the  pipeline  [BaeSO].  In  vector  and  array 
parallel  system  one  network  is  used  for  interprocessor  communication  and  a 
usually  separate  network  b  used  by  the  control  unit  to  broadcast  data, 
instructions,  and  control  information  to  the  processors  [BaB68].  In  a  systolic 
system  the  network  is  used  to  propagate  the  wave  of  partial  results  from  a  set 


4 


of  processors  to  the  next  set  of  processors  [KuL78].  In  an  associative  system 
the  control  unit  uses  the  network  to  broadcast  selected  data  fields  to  the 
processors  for  comparison,  and  in  some  cases  another  network  is  used  for 
interprocessor  communications  |6at74].  Reconfigurable  systems  have  a 
network  that  allows  the  system  to  be  statically  or  dynamically  restructured 
into  multiple  machines  of  different  sizes  (SiS84].  A  data  flow  system  consisting 
of  multiple  rings  needs  a  communication  network  to  move  data  among  rings 
(WaG82l. 

In  Chapter  4  a  general  model  of  single  stage  interconnection  networks  is 
developed  [SeS84a].  This  model  is  sufficiently  general  so  that  it  can  be  used  to 
model  networks  with  an  arbitrary  topology,  including  both  regular  and 
irregular  topologies.  The  model  is  independent  of  the  method  of 
implementation  of  the  network.  This  is  necessary  because  properties  of 
networks  such  as  similarity  relationships,  emulation,  and  partitionability  of 
networks  are  implementation  independent. 

The  model  together  with  additional  information  is  then  used  to  construct 
a  model  for  parallel  computer  systems.  A  system,  informally,  consists  of  a  set 
of  devices,  an  interconnection  network,  and  a  method  for  use  of  the  network. 
Each  device  is  assumed  to  have  two  logical  ports,  an  input  port  and  an  output 
port,  possibly  implemented  physically  as  the  same  set  of  I/O  pins.  Some 
examples  of  devices  are  processors,  memories,  or  processor/memory  pairs. 
Based  upon  the  use  of  the  network,  three  types  of  systems,  recirculating, 
nonrecirculating,  and  partially  recirculating,  are  defined.  Relationships 
between  systems  such  as  equality  and  three  types  of  subsystems  are  rigidly 
defined  and  their  properties  explored. 


5 


In  Chapter  S  a  technique  is  developed  to  measure  the  similarity  of  two 
systems.  This  generalizes  the  past  work  on  similarity  used  by  many  researchers 
which  classifies  the  relationships  between  two  networks  into  two  kinds  only,  (a) 
the  networks  are  isomorphic  or  (b)  the  networks  are  not  isomorphic.  The 
measure  has  a  number  of  uses  and  is  applied  in  this  chapter  to  the  analysis  of 
emulation.  Our  definition  of  emulation  is  a  generalized  case  of  the  one 
described  in  [FiF82l. 

Previous  work,  related  to  our  research  developed  here,  can  be  found  in  the 
classification  of  groups  in  the  field  of  abstract  algebra  and  group  theory  [HanfiS, 
Her75].  The  theory  of  group  classification  is  based  upon  the  concept  of 
morphism.  Morphism  measures  the  similarity  of  behavior  between  group 
operations  of  two  groups.  This  measure  ignores  the  labeling  of  the  elements  of 
the  groups  and  is  concerned  strictly  with  the  structure  which  is  determined  by 
the  group  operation. 

Based  upon  the  idea  of  morphism  of  groups,  the  concept  of  morphism  of 
systems  is  developed  (SeS84a|.  In  the  domain  of  parallel  computer  systems  the 
structure  of  interest  is  the  structure  of  the  correspondences  of  the  system’s 
network  in  the  graph  theoretical  sense.  The  morphism  of  systems  is  called 
quasimorphism  and  allows  a  method  of  comparison  between  topologically 
arbitrary  parallel  computer  systems.  The  quasimorphism  facilitates  the 
analysis  of  following  problems  in  parallel  processing:  system  emulation, 
multiple  mapping  of  a  problem  into  a  system  for  increased  reliability,  and 
partitioning  of  systems.  The  quasimorphism  is  analyzed  with  respect  to 
properties  similar  to  the  properties  of  reflexivity,  symmetry,  and  transitivity. 

Also  in  this  chapter  the  problem  of  emulation  of  one  system  by  another  is 
discussed.  Three  different  types  of  emulation  are  considered.  Several  efficiency 


measures  of  the  emulation  were  defined  and  the  three  types  of  emulation  were 
evaluated  using  these  criteria. 

In  Chapter  6  the  composition,  decomposition,  and  partitionability  of  single 
stage  networks  are  studied  [SeS85].  Informally,  the  partitionability  property 
means  that  the  network  can  be  divided  into  several  parts  each  of  which  has 
certain  degree  of  independence.  The  type  of  partitionability  analyzed  in  this 
chapter  has  three  subtypes. 

The  partitionability  property  of  interconnection  networks  in  the  context  of 
parallel  computer  systems  has  the  following  advantages,  besides  being 
interesting  from  the  theoretical  point  of  view. 

(1)  If  the  network  is  partitionable  then  the  resource  allocation  of  only  a  subset 

of  the  total  resources  is  possible.  This  can  be  used  as  follows. 

(a)  A  user  can  utilize  only  a  small  part  of  the  machine  for  program 
development  phase. 

(b)  In  a  multiple  user  environment  the  partitioning  provides  a  natural 
protection  among  users. 

(c)  In  a  multitasking  environment  the  partitioning  provides  a  protection 
among  independent  tasks. 

(2)  If  the  network  is  partitionable,  the  fault  tolerance  of  the  system  increases 

as  follows. 

(a)  A  method  of  graceful  degradation  is  possible  by  separating  the  faulty 
section  from  the  correctly  operating  ones. 

(b)  If  in  addition  to  being  a  partitionable  network,  the  sections  are 
isomorphic,  then  an  increase  of  reliability  may  be  realized  by  multiple 
mappings  of  the  same  task  onto  the  multiple  sections  and  tandem 


cross  checking  of  partial  results. 

(c)  It  is  possible  to  construct  a  link  and  switching  element  fault  tolerant 
network  using  a  partitionable  network  as  a  core. 

(3)  If  the  network  is  partitionable,  then  there  b  an  efficient  implementation  in 
terms  of  hardware  and  control.  The  network  can  be  implemented  as  a  set 
of  network  components  each  with  its  own  set  of  inputs  and  outputs. 
Consequently  the  data  path  layout  and  in  some  instances  the  control  lines 
layout  on  VLSI  chip  or  on  a  printed  circuit  board  can  be  simplified. 

An  algorithm  to  classify  partitionability  of  interconnection  networks  is 
developed  which  will  output  one  of  the  following: 

(1)  The  network  b  not  partitionable. 

(2)  The  network  is  partitionable  into  subnetworks  with  common  control 

signals  and  the  combination  of  the  of  the  subnetworks  will  exactly 

generate  all  interconnection  patterns  of  the  original  network. 

(3)  The  network  b  partitionable  into  subnetworks  with  separate  control 

signals  and  the  combination  of  the  subnetworks  will  exactly  generate  all 
interconnection  patterns  of  the  original  network. 

(4)  The  network  is  partitionable  into  subnetworks  with  separate  control 

signals  and  the  combination  of  the  subnetworks  will  generate  a  superset  of 
interconnection  patterns  of  the  original  network. 

The  algorithm  b  general  in  the  sense  that  it  will  accept  as  an  input  a 
topologically  arbitrary  interconnection  network. 

In  Chapter  7  the  synthesis  of  single  stage  interconnection  networks  with 
the  partitionability  property  b  studied.  Several  different  techniques  are 
developed,  each  of  which  can  be  used  to  construct  a  large  class  of  single  stage 


8 


partitionable  networks.  The  algorithms  are  presented  for  a  simplified  case,  but 
they  can  be  easily  generalized  in  a  number  of  different  ways. 

In  the  first  part,  of  this  chapter  an  algorithm  to  generate  a  large  class  of 
partitionable  networks  is  developed  and  proven  correct.  This  algorithm  b 
based  upon  the  results  of  the  analysis  presented  in  Chapter  6. 

The  second  part  of  this  chapter  dbcusses  the  problem  of  synthesb  of  a 
special  case  of  partitionable  networks.  Thb  special  class  of  networks  consists 
of  those  networks  that  are  isomorphic  to  a  direct  product  of  groups  {Han68, 
Her75].  Since  these  groups  have  been  studied  in  the  abstract  algebra 
extensively,  techniques  are  known  to  determine  the  possibility  of  decomposition 
of  a  given  group  into  a  direct  product  of  groups. 

In  Chapter  8  the  analysb  of  multistage  networks  will  be  addressed.  Thb 
extends  the  work  done  in  Chapter  6  into  the  domain  of  multistage 
interconnection  networks. 

First  a  method  of  composition  of  single  stage  networks  b  presented  and  its 
properties  studied.  Using  the  composition  of  single  stage  networks,  the 
multistage  model  b  defined.  Thb  approach  has  the  advantage  that  some 
results  of  analysb  of  single  stage  networks  can  be  applied  to  the  study  of  the 
multbtage  networks.  The  model  is  very  general  since  each  stage  consbts  of  the 
general  single  stage  model  presented  earlier.  Several  examples  of  an  application 
of  the  multistage  model  are  presented. 

In  Chapter  0  a  case  study  of  a  communication  system  for  a  real-time, 
dbtributed  digital  signal  processing  system.  Network  and  network  interfaces 
are  designed  subject  to  number  of  system  constraints  such  as  very  high 
throughput,  system  extendibility,  and  fault  tolerance  requirements.  For  this 


0 


application,  and  given  the  current  and  near  future  technology,  a  crossbar  based 
interconnection  network  was  selected  for  the  task  under  consideration.  Two 
different  fault  tolerant  chip  architectures  are  presented.  Four  network 
architectures  are  designed  and  their  characteristics  are  discussed.  Several  fault 
detection  and  recovery  techniques  on  the  system  level  are  developed. 

In  Chapter  10,  a  study  of  shuffle  interconnection  function  emulation  by 
PM2I  and  Illiac  SIMD  networks  is  performed.  It  was  previously  shown  that  a 
lower  bound  on  the  number  of  transfers  needed  for  the  PM2I  network  to 
perform  the  shuffle  is  log2N.  The  algorithm  described  here  is  near  optimal  and 
requires  only  (log2N)  +  l  transfers.  Also,  an  algorithm  for  the  case  where  there 
b  a  machine  with  a  PM2I  network  and  it  b  desired  to  emulate  a  shuffle  that  b 
of  smaller  size  than  the  host  network  b  presented.  Using  the  PM2I  algorithm 
as  a  basis,  an  algorithm  for  the  Illiac  to  emulate  the  shuffle  b  given.  It  requires 
2>/N~  1  transfers,  which  b  only  three  transfers  more  than  lower  bound  of 
2n/N  -  4  shown  previously. 


In  Chapter  3  an  overview  of  several  classes  of  tightly  coupled  parallel 
computer  architectures  will  be  given.  First  the  defining  features  of  each  class 
will  be  presented,  and  then  an  example  of  the  class  will  be  dbcussed  in  detail. 
All  the  examples  consist  of  existing  systems  or  systems  in  research  or  design 
stages  which  have  been  described  in  the  literature. 

In  Chapter  4  the  network  model  is  presented.  The  model  together  with 
additional  information  is  then  used  to  define  the  model  of  a  parallel  computer 
system.  Three  types  of  systems  based  upon  the  method  of  use  of  the  network 
are  defined  and  examples  of  each  category  given. 

In  Chapter  5  a  measure  of  similarity  of  systems  with  arbitrary  labeling 
and  topology  is  introduced.  The  measure  is  called  quasimorphism  and  is  used 
in  this  chapter  to  analyze  emulation  of  one  system  by  another. 

In  Chapter  6  the  horizontal  composition  and  decomposition  of 
interconnection  networks  are  formally  defined  and  analyzed.  Using  the 
compositions,  three  types  of  partitionable  single  stage  networks  are  recognized. 
An  algorithm  is  presented  that  accepts  as  an  input  a  topologically  arbitrary 
interconnection  network  and  outputs  one  of  following  four  outcomes:  the 
network  is  not  partitionable,  or  the  network  is  partitionable  in  one  of  the  three 
types. 

In  Chapter  7  the  synthesis  of  single  stage  partitionable  networks  is 
studied.  An  algorithm  is  presented  to  synthesize  a  large  class  of  partitionable 
networks.  In  addition,  a  special  class  of  partitionable  interconnection  networks 
that  are  isomorphic  to  a  direct  product  of  groups  is  described. 


In  Chapter  8  the  analysis  of  multistage  networks  is  discussed.  Basic 
definitions  such  as  vertical  composition  of  networks  is  presented  and  its 
properties  analyzed.  Using  composition  of  single  stage  networks,  the  multistage 
network  model  is  defined  and  some  applications  are  shown. 

In  Chapter  0  a  network  and  network  interfaces  are  designed  for  a  real¬ 
time,  distributed  digital  signal  processing  system.  The  design  is  subject  to 
number  of  system  constraints  such  as  very  high  throughput,  system 
extendibility,  and  fault  tolerance  requirements.  Several  fault  detection  and 
recovery  techniques  on  the  system  level  are  studied,  since  fault  tolerance  is  a 
salient  issue  of  this  system. 

In  Chapter  10  the  ability  of  the  PM2I  and  Illiac  type  single  stage  SIMD 
machine  interconnection  networks  to  perform  the  shuffle  interconnection  was 
examined.  Two  algorithms  were  developed,  one  for  the  case  of  a  PM2I  of  same 
size  as  the  shuffle  and  one  for  the  case  of  a  PM21  of  a  larger  size  than  the 
shuffle.  Both  algorithms  are  near  optimal  in  the  number  of  network  transfers. 
In  addition,  using  the  PM2I  algorithm  as  a  basis,  an  algorithm  for  the  Illiac  to 
emulate  the  shuffle  is  developed. 


3.1  Introduction 


i 


One  method  of  speeding  up  the  execution  of  computational  tasks  is  to  use 
parallel  computer  architectures  which  exploit  the  parallelism  in  the  execution 
phase  of  the  task.  Numerous  parallel  computer  systems  have  been  discussed  in 
the  literature  and  proposed,  and  several  have  been  built.  Parallel  systems  are 
divided  into  two  major  classes,  tightly  coupled  and  loosely  coupled.  The 
subject  of  analysis  here  is  the  tightly  coupled  parallel  systems  group  which  can 
be  divided  into  several  categories. 

As  will  be  shown,  each  type  of  parallel  computer  architecture  requires  one 
or  more  interconnection  networks.  Some  systems  use  networks  dedicated  to 
the  communication  between  particular  subsystems,  some  other  systems  use  a 
single  network  multiplexed  for  communication  among  different  parts  of  the 
system.  In  an  ensemble  parallel  system  the  network  is  used  by  the  control  unit 
to  broadcast  instructions  and  data.  In  a  pipelined  system  the  interconnection 
network  is  used  to  provide  data  communication  among  the  computational  units 
(segments)  of  the  pipeline.  In  vector  and  array  parallel  system  one  network  is 
used  for  interprocessor  communication  and  a  usually  separate  network  is  used 
by  the  control  unit  to  broadcast  data,  instructions,  and  control  information  to 
the  processors.  In  a  systolic  system  the  network  is  used  to  propagate  the  wave 
of  the  partial  results  from  a  set  of  processors  to  the  next  set  of  processors.  In 
an  associative  system  the  control  unit  uses  the  network  to  broadcast  the 
selected  data  fields  to  the  processors  for  comparison,  and  in  some  cases  another 
network  is  used  for  the  interprocessor  communications.  Reconfigurable  systems 
use  a  network  for  interprocessor  communication  and  perhaps  a  different 


15 


network  for  fetching/storing  data  in  the  memories.  Data  flow  system 
consisting  of  multiple  rings  needs  a  communication  network  to  move  data 
among  rings. 


3.2  Overview 


In  this  chapter  an  overview  of  different  classes  of  tightly  coupled  parallel 
computer  architectures  will  be  given.  Each  class  will  be  presented  as  follows. 
First  the  defining  features  of  the  class  will  be  presented,  and  then  a 
representative  system  of  the  class  will  be  discussed.  All  the  examples  consbt  of 
existing  systems  or  systems  in  research  or  design  stages  described  in  the 
literature.  For  a  good  survey  of  systems  see  [HaL82]  and  of  interconnection 
networks  see  [Sie85]. 


3.3  Problem  Statement 


Several  categories  of  parallel  computer  architectures  will  be  defined.  This 
will  be  followed  by  a  detailed  description  of  an  example  of  architecture  in  each 
category.  The  description  of  the  system  will  demonstrate  that  each  category  of 
parallel  computer  architecture  described  uses  one  or  more  interconnection 


16 


network  as  well  as  show  the  different  ways  the  networks  are  used  by  the 
system. 


3.4  Parallel  Computer  Architecture  Classes 


The  Ensemble  Processors  achieve  the  speedup  of  execution  of 
computational  task  by  utilizing  many  processing  elements  each  of  which  is 
operating  on  an  independent  data  stream.  The  system  does  not  use  an 
interprocessor  interconnection  network,  however,  the  control  unit  uses  an 
interconnection  network  to  transfer  data  and  instructions  to  the  processors. 

A  representative  of  this  group  b  the  Parallel  Element  Processing  Ensemble 
(PEPE)  (ThW75,  ViC78|,  whose  design  can  support  up  to  288  processors. 
PEPE  was  developed  to  handle  the  tracking  of  multiple  targets  and  as  such  it 
must  compute  identical  operations  on  large  number  of  independent  data 
streams.  These  data  streams  are  radar  signal  returns  of  possibly  multiple 
objects  entering  the  radar’s  surveillance  volume.  PEPE  also  uses  an  associative 
operation  to  locate  the  file  of  a  target  given  its  new  data  coordinates.  This 
operation  is  implemented  by  broadcasting  of  the  new  data  from  the  control 
unit  to  the  processors  using  the  interconnection  network.  If  a  correlation  is 
found  between  new  data  and  a  file  in  a  processor  then  the  new  information  is 
added  to  the  file,  otherwise  an  idle  processor  will  be  allocated  for  a  new  target. 


The  pipelined  processors  (MISD  mode)  achieve  speedup  of  computation  by 
(a)  breaking  the  instruction  into  a  sequence  of  smaller  operations  and  (b) 
executing  concurrently  the  smaller  operations  using  several  computational 
units.  The  flow  of  data  is  such  that  unit  u-,  executes  its  subtask  and  passes  the 
data  to  unit  U2.t.|,  hence  the  term  pipeline.  Some  systems  that  fall  into  this 
category  are  TI  ASC  [BaeSO,  StoSO,  The74),  CRAY  1  [K0T8O),  and  CYBER 
205  [BaeSO,  K0T8O]. 

The  TI  Advanced  Scientific  Computer  (ASC)  consists  of  an  instruction 
unit  and  from  one  to  four  processing  units.  The  instruction  unit  is  constructed 
as  a  four  stage  pipeline  and  the  stages  are:  instruction  fetch,  instruction 
decode,  effective  address  calculation,  and  register  operand  fetch.  All  processing 
units  are  identical  and  each  consists  of  eight  stages,  however,  using  a  dynamic 
reconfiguration  (via  a  network)  a  custom  pipeline  can  be  constructed  from  the 
basic  eight  elements.  The  stages  of  the  processing  unit  are:  input,  exponent 
subtractor,  prenormalizer,  multiplier,  adder,  normalizer,  accumulator,  and 
output. 

The  vector  and  array  processors  (SIMD  Mode)  achieve  speedup  of 
computation  by  using  a  large  number  of  computational  elements.  Examples  of 
their  applications  include  the  image  processing,  such  as  filtering  and 
convolution,  and  in  matrix  operations  for  the  weather  prediction  or  simulation. 
Some  examples  of  these  systems  are  Illiac  IV  |BaB68,  BoD72),  MPP  [BatSO], 
Cartesian  Moment  Computer  (CMC)  [ReS82,  Seb82j,  and  BSP  (KoTSOj.  Two 
examples  will  be  discussed,  the  MPP  and  the  BSP. 

The  Massive  Parallel  Processor  (MPP)  consists  of  128  x  128  =  16384 
simple  processing  elements.  Each  element  processes  data  one  bit  wide  (bit 
serial).  Each  processor  communicates  with  other  processors  in  the  array  using 


the  four  nearest  neighbor  interconnection  network.  The  processing  array  uses 
staging  memories  to  reorder  the  data  received  from  a  satellite  into  a  form 
where  each  processor  receives  all  the  bits  of  the  grey  value  of  one  pixel  in  the 
image. 

Burroughs  Scientific  Processor  (BSP)  consists  of  16  arithmetic  units,  each 
capable  of  operating  on  48  bit  words.  There  is  an  input  alignment  network  to 
move  data  from  the  17  memory  units  to  the  16  arithmetic  units  and  an  output 
alignment  network  to  move  the  data  from  the  16  arithmetic  units  to  the  17 
memory  units.  The  alignment  network  allows  a  16x16  matrix  to  be  stored  in 
the  17  memories  in  such  manner  that  row,  column,  diagonal,  and  many  other 
substructures  of  the  matrix  can  be  fetched/stored  without  an  accessing  confiict 
[BuK711. 

The  systolic  arrays  or  wavefront  processors  [KuL78,  Kun82]  receive  the 
name  from  their  mode  of  operation  which  can  be  described  as  follows.  The 
systolic  arrays  are  usually  organized  as  one  or  two  dimensional  arrays  of  simple 
processors,  each  connected  to  its  neighbors  in  some  regular  way  (two,  three, 
four,  or  six  nearest  neighbors).  Each  processor  repeatedly  executes  the  same 
operation  on  data  as  it  is  pipelined  through  the  systolic  array,  creating  partial 
results.  Each  partial  result  is  passed  to  a  neighboring  processor  which  will  use 
the  partial  result  and  additional  (partial)  results  to  create  a  more  complete 
result  until  finally  at  the  output  edge  of  the  array  the  final  result  is  outputted. 

The  associative  processors  achieve  the.  speedup  by  operating  in  parallel  on 
a  large  number  of  records  that  are  selected  based  on  the  value  of  a  field  in  the 
record.  Examples  in  this  category  are  ST  ARAN  (Bat74,  FeF74,  RoP77], 
OMEN  (Hig72],  and  ALAP  [YaF77].  The  ALAP  will  be  described  here. 


10 


The  Associative  Linear  Array  Processor  (ALAP)  consists  of  a  linearly 
connected  array  of  processors  that  receive  common  data  and  commands  from 
the  control  unit.  Their  matching  line  outputs  are  “or”ed  together  to  notify  the 
control  unit  if  there  b  a  match  to  the  input  data.  A  VLSI  system  consbting  of 
13  processors  was  constructed  and  tested.  A  bus  b  used  to  input  individual  as 
well  as  common  data  into  the  processors,  therefore  it  would  become  a 
bottleneck  if  a  large  number  of  processors  were  used. 

The  reconfigurable  systems  consbt  of  a  large  number  of  processors  which 
communicate  through  a  reconfigurable  interconnection  network.  Some 
examples  of  tbb  category  are  PASM  [SiS79,  SiSSl,  SiS84|,  TRAC  [KaPSO, 
SeU80),  and  CHIP  [Sny82j.  The  PASM  system  will  be  described  here. 

The  partitionable  SIMD/MIMD  (PASM)  system  b  currently  under 
development  in  Purdue  University,  School  of  Electrical  Engineering.  The 
system  includes  Q  =  2**  Micro  Controllers  (MCs),  and  the  Parallel  Computation 
Unit  (PCU)  which  is  comprised  of  N  =  2"  processors,  N  memory  modules,  and 
an  interconnection  network. 

The  system’s  strength  lies  in  its  ability  to  allocate  a  subset  of  its  N 
processors  to  a  particular  task.  For  details  on  the  allocation  strategies  see 
[TuS83].  It  b  intended  to  be  used  in  image  processing  and  pattern  recognition 
applications.  The  collection  of  resources  consbting  of  RN/Q  processors, 
(R  =  2',  0  <  r  <  q)  together  with  R  Micro  Controllers  and  RN/Q  memories  is 
called  a  virtual  machine.  The  actual  processors  selected  for  a  given  virtual 
machine  depend  upon  the  type  of  partitionability  of  the  system  and  the 
partition  selected.  The  type  of  partitionability  is  a  function  of  the  inter¬ 
processor  interconnection  network.  The  virtual  machines  are  independent  of 
each  other,  consequently  different  machines  can  execute  different  jobs 


concurrently.  The  current  status  of  PASM  b  the  logic  design  phase  and 
building  of  a  small  prototype  of  16  processors  and  four  MC’s  using  off  shelf 
logic  devices. 


Data  flow  system  achieves  computational  speedup  by  exploiting  the 
parallelism  at  the  instruction  level  (WaG82].  Conceptually,  each  instruction  is 
translated  into  a  template  consbting  of  an  operation  and  data  slots.  An 
instruction  gets  executed  if  its  data  are  available  and  a  processor  b  available. 

Data  flow  computers  are  usually  implemented  as  rings,  each  ring  consbting 
of  at  least  the  following  blocks:  a  token  queue,  a  matching  store,  and  a 
processing  unit.  The  “token  queue”  saves  results  generated  by  the  processing 
unit.  The  “matching  store”  tries  to  match  incoming  tokens  from  the  token 
queue  with  the  slots  of  templates  currently  residing  in  the  matching  store.  The 
“processing  unit”  accepts  the  instruction  template  with  all  its  fields  resolved 
and  executes  the  operation,  passing  the  results  to  the  token  queue.  Since 
multiple  rings  each  consisting  of  a  token  queue,  a  matching  store,  and  a 
processing  unit  are  used  for  speedup  of  the  execution,  a  token  generated  in  one 
ring  may  be  needed  as  a  data  in  a  template  residing  in  the  matching  store  of 
another  ring.  In  order  for  the  token  to  move  from  one  ring  to  another  an 
interconnection  network  must  be  used  to  connect  the  data  paths  of  different 


S.6  Conclostoiis 


Id  this  chapter  an  overview  of  several  major  classes  of  tightly  coupled 
parallel  computer  systems  was  presented.  Each  category  of  parallel  computer 
system  was  described  in  sufficient  details  to  show  that  an  essential  part  of  each 
system  is  one  or  more  interconnection  network.  The  usage  of  the 
interconnection  network  varies  from  system  to  system.  Some  systems  use 
networks  dedicated  to  the  communication  between  particular  subsystems,  some 
other  systems  use  a  single  network  multiplexed  for  communication  among 
different  parts  of  the  system.  In  an  ensemble  parallel  system  the  network  is 
used  by  the  control  unit  to  broadcast  instructions  and  data.  In  a  pipelined 
system  the  interconnection  network  b  used  to  provide  data  communication 
among  the  computational  units  (segments)  of  the  pipeline.  In  a  vector  and 
array  parallel  system  one  network  is  used  for  interprocessor  communication  and 
a  usually  separate  network  b  used  by  the  control  unit  to  broadcast  data, 
instructions,  and  control  information  to  the  processors.  In  a  systolic  system  the 
network  b  used  to  propagate  the  wave  of  the  partial  results  from  a  set  of 
processors  to  the  next  set  of  processors.  In  an  associative  system  the  control 
unit  uses  the  network  to  broadcast  the  selected  data  fields  to  the  processors  for 
comparison,  and  in  some  cases  another  network  b  used  for  interprocessor 
communications.  Reconfigurable  system  uses  a  network  for  interprocessor 
communication  and  perhaps  a  different  network  for  fetching/storing  data  in  the 
memories.  Data  flow  system  consisting  of  multiple  rings  needs  a 
communication  network  to  move  data  among  rings. 


22 


WheD  a  designer  is  facing  the  problem  of  selecting  a  parallel  computer 
system  for  a  particular  task  or  a  class  of  tasks,  then  several  properties  of  the 
network  becomes  of  interest.  These  properties  are  heavily  dependent  upon  the 
topology  of  the  network  and  therefore  the  study  of  the  topological  properties  of 
networks  is  an  important  method  of  evaluation  and  classification  of  parallel 
computer  systems. 


Most  current  analytical  techniques  for  interconnection  networks  and 
modeling  techniques  of  networks  are  concentrated  on  the  analysis  of 
topologically  regular  interconnection  networks.  Examples  of  such  networks  are 
Illiac  {6oD72],  Shuffle  (LaS76,  SeS84b],  multistage  Cube  [AdS82b],  STARAN 
[Bat74],  ADM  [McS82],  k-connected  mesh  [NaS80],  and  PM2I  {SeS84b].  The 
past  research  usually  proceeded  on  the  following  lines.  A  network  specific 
model  is  defined  and  then  analytical  results  are  derived  using  this  model.  The 
problem  with  this  approach  is  that  the  results  developed  are  problem  specific, 
that  is  to  say,  the  results  are  valid  only  for  the  small  class  of  networks  that  the 
model  represents.  One  way  to  generalize  the  results  of  the  analysis  is  to 
develop  a  general  model  describing  the  topology  of  the  network. 

In  this  chapter,  the  following  problems  are  dbcussed.  A  general  problem 
of  modeling  networks  with  arbitrary  topology  is  developed  [SeS84a].  This 
model  is  sufficiently  general  so  that  it  can  be  used  to  model  networks  with 
arbitrary  including  regular  and  irregular  topology.  The  model  is  independent 
of  the  method  of  implementation  of  the  network.  This  is  necessary  because 
properties  of  networks  such  as  similarity  measures,  emulation  and 
partitionability  of  networks  are  implementation  independent.  The  similarity 
measures  between  two  networks  is  classified  into  several  classes.  This  is  a 
refinment  of  the  old  system  which  classified  the  similarity  measure  between  two 
networks  into  two  classes  only,  isomorphic  and  nonisomorphic.  The  model  of 
network  together  with  additional  information  is  then  used  to  construct  a  model 
for  parallel  computer  systems.  A  system,  informally,  consists  of  a  set  of 


25 


.V. 


devices,  interconnection  network,  and  a  method  of  use  of  the  network.  Each 
device  is  assumed  to  have  two  logical  ports,  an  input  port  and  an  output  port, 
possibly  implemented  as  the  physically  same  set  of  I/O  pins.  Some  examples  of 
devices  are  processors,  memories,  or  processor/memory  pairs.  Based  upon  the 
use  of  the  network,  three  types  of  systems,  recirculating,  nonrecirculating,  and 
partially  recirculating  are  defined.  Relationships  between  systems  such  as 
equality  and  three  types  of  subsystems  are  rigidly  defined  and  their  properties 
explored. 


4.2  Overview 


This  chapter  is  organized  as  follows.  In  section  4.3  definitions  of  the 
problems  addressed  in  this  chapter  are  given.  In  section  4.4  the  previous 
related  work  is  briefly  described.  In  section  4.5  the  basic  concepts  are  defined. 
In  section  4.6  the  network  model  is  presented,  several  major  relationships 
between  networks  described,  their  properties  given  and  some  examples  of 
applications  presented.  In  section  4.7  the  concept  of  a  parallel  computer 
system  is  formally  introduced.  Three  types  of  systems  based  upon  the  method 
of  use  of  the  network  are  defined  and  examples  of  each  category  given.  Several 
similarity  measures  between  two  systems  are  defined  and  examples  presented. 
In  section  4.8  the  conclusion  and  summary  of  the  chapter  is  given. 


26 


I 

> 

i 

i 

4.3  Problem  St»tement 

I 

In  this  section,  an  informal  description  of  the  work  presented  in  this 
chapter  is  given.  The  descriptions  will  be  informal  only,  as  the  basic 
mathematical  concepts  have  not  been  defined  yet  and  will  be  introduced  later 
in  this  chapter.  The  following  problems  of  analysis  of  interconnection  networks 
are  addressed  in  this  chapter.  In  order  to  analyze  the  topological  properties  of 
interconnection  networks  a  model  must  be  developed.  In  this  chapter,  a 
general  model  of  topologically  arbitrary  interconnection  networks  is  presented 
which  will  be  used  through  most  of  this  research  [SeS84b].  This  model  is 
implementation  independent,  which  is  desired  since  the  topological  properties 
of  networks  are  implementation  independent,  moreover  if  the  model  were 
implementation  dependent  that  would  reduce  its  scope  of  applicability  to  the 
class  of  networks  having  that  implementation.  Next,  several  important 
relationships  between  networks  such  as  equality  and  two  types  of  subnetworks 
are  defined  and  their  properties  shown.  In  the  next  section  the  model  for  a 
parallel  computer  system  is  defined.  A  system,  informally  consists  of  a  set  of 
devices,  an  interconnection  network,  and  a  use  of  the  network.  Each  device 
(processor,  memory  or  processor /memory  pair)  is  assumed  to  have  two  logical 
ports,  one  input  port  and  one  output  port.  Based  upon  the  method  of  use  of 
the  network,  three  types  of  systems,  recirculating,  nonrecirculating,  and 
partially  recirculating  are  defined.  Relationships  between  systems  such  as 
equality  and  three  types  of  subsystems  are  rigidly  defined  and  their  properties 
explored. 


4.4  Previous  Work 


In  this  section,  the  previous  work  is  briefly  described.  Previous  work  on 
modeling  of  interconnection  networks  in  [Gok76,  GoL73,  LiM82,  Upp81]  was 
used  to  describe  the  class  of  SW  Banyans  networks.  The  model  b  based  on 
graph  theory  and  b  sufficiently  general  to  describe  the  class  of  SW  Banyans, 
however  it  b  implementation  dependent,  which  narrows  down  the  the  scope  of 
its  applicability.  Some  bsues  discussed  using  the  model  were  mapping  methods 
of  simple  regular  interconnection  networks  such  as  ring  or  a  tree  onto  the 
Banyan  networks.  Additional  work  on  modeling  of  regular  networks,  such  as 
mesh,  shuffie.  Cube  and  PM2I  was  done  in  [FiF82],  and  was  network  specific. 
In  (FiF82]  a  specific  class  of  networks  called  quotient  networks  was  dbcussed. 
Informally,  a  quotient  network  is  a  network  that  b  homomorphic  to  the  same 
type  of  network  of  a  smaller  size  in  terms  of  processors.  A  class  specific  model 
was  developed  in  (RaF83]  for  the  evaluation  of  a  class  of  linear  array-processor 
systems  for  VLSI  implementation.  A  general  model  was  developed  for  the 
analysis  of  time  space  tradeoff  cl  interconnection  network  in  [MaM81b].  A 
good  overview  of  interconnection  networks  can  be  found  in  [Sie85,  WuF84]. 


28 


4.5  Basic  Concepts 


In  this  section,  basic  definitions  and  notation  needed  as  the  background  for 
the  rest  of  the  paper  are  introduced.  Some  of  the  definitions  can  be  found  in 
books  on  basic  abstract  algebra  (Han68,  Her75]  and  graph  theory  (BoM76, 
Har69],  however  are  included  here  for  completeness.  The  purpose  of  these 
definitions  is  to  develop  a  formal  notation  that  will  be  used  to  discuss  more 
complex  concepts  such  as  networks  and  systems.  To  relate  these  definitions  to 
the  subject  at  hand,  some  examples  are  given  in  the  end  of  this  section. 

Let  the  set  of  input  labels  of  a  graph/algebraic  structure  be  denoted  by 
Vf  and  the  set  of  output  labels  of  the  structure  be  denoted  by  Vq  .  All 
graph/algebraic  structures  defined  in  this  paper  over  V|  x  Vq  will  assume  that 
Vj  n  Vq  =  01  Vi  ^  0,  where  0  is  the  empty  set  and 

V/  X  Vc  =  { <v„Vb>  I  V,  €  V,,  Vb  G  Vo}. 

The  following  notation  will  be  used  throughout  this  paper.  The  symbols 
are  enclosed  in  a  pair  of  double  quotation  marks. 

..{»  ..}»  _  delimiters  for  set.  -  function  application  and  grouping  of 

operations.  "  .  delimiters  for  n-tuple. 

gg  defined  in  context. 

Definition  4-S  i- 

Let  A  be  a  set,  then  P(A|  ^  {B  |  B  C  A}  is  the  power  set  of  A. 


i 

i 


29 


Definition  4  5  2: 

Let  C„  6  P[V|  X  Vq),  then  C„  \a  an  I/O  eorrespondenee  over  V/  x  Vq. 
Definition  4-5.8: 

Let  C„  6  P[V,  X  VqI  such  that  <v.,Vb>,  <Ve,Vj>  G  C„  -♦  v^, 

then  the  is  a  nondestructive  I/O  correspondence  over  Vix  Vo. 
(Physically,  represents  one  state  of  a  reconfigurable  network). 

Definition  4-5.4: 

Let  C[V|  X  Vq]  ^  {C„  6  P[V|  x  Vq)  |  C„  is  nondestructive).  Then 
C[Vi  X  Vq]  is  called  the  C-set  over  Vj  x  Vq. 

The  definitions  4.5.5  to  4.5.8  discuss  the  connectivity  or  accessibility  aspects  of 
the  I/O  correspondences. 

Definition  4-5-5: 

Let  C„  €  CfV,  X  Vq),  then  s(C„)  ^  {v,|  <v„Vb>  £  C„}  is  the 
source  set  of  C„. 

Definition  4-5-6: 

Let  C„  e  C[V,  X  Vol,  then  d{C„,)  i  {Vb|  <v„Vb>  £  C„}  is  the 
destination  set  of  C^. 

Definition  4-5-7: 

Let  C  =  {C„)  m  =  l,2,...n}  C  CfVi  x  Vq],  then  s(C)  ^  U  s(C„)  is  the 

m 

source  set  of  C. 


•  *  ^ 


Definition  ^.5.8: 

Let  C  =  {C_|  in=l,2,...ii}  C  C[V,  x  Vo),  then  d(C)  i  U  <1(C„)  is  the 

'  m 

deetination  set  of  C. 

Example  4  5  9: 

Let  Vj  =  {vo,v,},  and  set  Vq  =  {uo,Ui,U2,U3}.  Consider  the  set 
A  =  {<Vo,Uo>,  <Vo,V2>,<v,,u3>}.  What  type  of  correspondence  it 
is. 

Solution: 

(a) :  Clearly  A  C  P[Vi  x  Vq),  therefore  A  is  an  I/O  correspondence 

over  V|  X  Vq. 

(b) :  <Vo,U(,>,<V3,U2>  €  A,  and  \iQ^n2.  <Vo,U2>,<v,,U3>  6  A, 

and  U2?^U3.  <Vo,U()>,<v,,U3>  G  A,  and  Uo^U3.  Therefore  A  is 
a  nondestructive  I/O  correspondence  over  V|  x  Vq. 

(c) :  The  source  set  of  A  b  {vq,V|}  =  Vj. 

(d) ;  The  destination  set  of  A  b  {uo,U2,U3}  C  Vq. 

Example  4  5- 10: 

Let  Vi  and  Vq  be  two  sets,  Vj  =  {vo,v,},Vo  =  {uo,Ui,U2,U3}.  Consider 
the  set  B  =  {<V0,Uo>,  <vo,U2>,  <v,,U2>}.  What  type  of 

correspondence  it  b. 

Solution: 


(a):  Clearly  B  C  P[Vi  x  Vq)  therefore  B  b  an  I/O  correspondence 

over  Vi  x  Vq. 


(b):  <vo,U2>,<V|,U2>  €  B  and  U2  =  U2.  Therefore  B  is  not 

nondestructive  I/O  correspondence  over  V|  x  Vq,  and  B  could 
not  represent  a  state  of  a  reconfigurable  network. 


4.6  Interconnection  Network  Model 

In  this  section,  a  formal  graph/algebraic  model  of  an  interconnection 
network  is  presented.  Graph  models  for  analyzing  networks  have  been  used  by 
other  researchers.  For  example,  in  (Gok76,  GoL73,  LiM82,  Upp81]  they  are 
used  to  analyze  regular  SW  Banyan  networks,  and  in  {FiF82]  they  are  used  to 
study  the  partitioning  of  regular  networks.  The  model  presented  here  differs 
from  [Gok76,  GoL73,  LiM82,  Upp8l]  and  [FiF82]  by  being  completely  general 
so  that  it  can  be  used  to  describe  an  arbitrary,  topologically  regular  and 
irregular,  interconnection  network. 

Certain  relationships  between  networks  that  are  of  interest  to  the 
computer  system  designer  are  presented  here  in  a  rigid  mathematical  fashion. 
In  particular,  the  relationships  subnetworks  and  equality  are  defined  and  their 
properties  described.  In  the  end  of  the  section,  some  examples  of  applications 
are  presented  and  an  example  is  generalized  into  a  theorem. 


Definition  4  6-i  ' 


Let  K  =  <C>  be  such  that: 

(1)  c  C  C[V,  X  Vo). 

(2)  V,  =  s(C). 

(3)  Vo  =  d(C). 

(3)  I  C|  >2. 

Then  K  =  <C>  is  an  f/0  representation  of  a  reeonfigurable  network 
over  V/  X  Vq. 

Physical  implications:  <v^,vj,>  £  C„,  C^,  G  C  represents  the  network  moving 
data  from  input  to  output  when  the  state  of  the  network  is  C„.  C 
represents  the  set  of  all  possible  states  of  the  reeonfigurable  network.  For  an 
example  of  a  topologically  arbitrary  interconnection  network  see  Figure  4.1. 
The  example  has  the  following  parameters: 

Vl  =  {u„Ub,Uj,  Vq  =  {Vo,V|}, 

Co  =  {<U„Vo>  <U„Vi>}, 

=  {<U.,Vo>,  <Ub,V|>}, 

Co  =  {<U„V,>,  <Ue,Vo>}, 

C  =  {Co,C„C2}.  K  =  <{Co,C„C2}>. 

Definition  4.6.2: 

Let  K[Vj  X  Vq]  -  {K  I  K  =  <C>  b  a  network  over  Vj  x  V©}.  Then 
K[V|  X  Vq]  is  called  the  K-set  over  Vf  x  Vq- 


The  Definitions  4.6.3  to  4.6.5  are  used  to  classify  formally  the  measure  of 
similarity  between  two  networks.  The  classes  are  presented  here  in  the  order  of 
increasing  strictness.  Note  that  these  relationships  provide  a  refined  scale  of 


the  measure  of  similarity  between  two  networks  compared  to  the  more 
customary  classification  of  isomorphic/nonisomorphic  which  was  used  in  the 
past.  Several  examples  illustrating  the  application  of  these  measures  are  given 
after  the  definitions  are  presented.  The  examples  are  generalized  into  a 
theorem  relating  the  PM2I  and  k-dimensional  Dliac  networks. 

Definition  4  6  3- 

Let  KIV,»  X  Vo‘],  =  <C‘>,  and  KIV,*  x  Vq*),  K*  =  <C2>,  be 

two  networks  such  that: 

(1)  V{  C  Vf,  Vi  c  vj. 

(*)  VC‘€C'3Cj6C»3;CiCC|. 

Then  is  eubnetwork  of  type  b  of  .  Notation:  K*  Cb  K*. 

Definition  4- ^-4' 

Let  K*  €  K(V,>xV^l,  K*  =  <C*>,  and  K*  €  KfVi^xVgl,  K*  =  <C*>, 
be  two  networks  such  that: 

(1)  V,*  C  V,2,  c  v§. 

(2)  V  C  ■  €  C‘  3  Cj  €  c*  9:  Ci  =  Cj. 

Then  ia  subnetwork  of  type  e  of  .  Notation:  K*  Cc  K^. 

Note:  The  reason  for  referring  to  these  subnetworks  as  types  b  and  c  is  to  make 
this  notation  consistent  with  the  definitions  of  subsystems  in  Section  4.7,  where 
the  three  types  of  subsystems  type  a,  b,  and  c  are  described. 

Definition  4-6.5: 

Let  €  K[V,‘  x  V^),  K»  =  <C‘>,  and  K*  e  K[V,*  x  Vg),  K*  = 
<C*>,  be  two  networks  such  that: 


tv 


(1)  v,‘  =  Vi»,  Vi  =  V3. 

(2)  C*  =  C*. 

Then  i»  equd  to  .  Notation:  K*  =  K*. 

The  Theorems  4.6.6  to  4.6.8  describe  the  sufficient  conditions  for  the 
relationships  of  the  different  types  to  exist. 

Theorem  4-6.6: 

Let  K'  €  K(Vi‘xViI,  K‘  =  <C>>,  and  K*  6  KlVi^xVgl,  K*  =  <C2>, 
be  two  networks.  If  \/C,^6C^  3  C*  €  C*  3:  C  C*,  then 


K‘  Cb  K*. 

Proof: 

(l):  Show  V,‘  C  Vf. 

(wcieC),  (3c*ec»)  a  (cisc») 
(VCi€C‘,  CiCC5„)) 
-«(VCiec‘,  s(ci)c  s(c=i„,)) 


(2); 


-*  (U  s(Ci)  C  U  8(CJ„|)) 

fn  iti 

-»  (U  MCi)  C  U  ^C»))  -  V,'  c  vj*. 

Ill  n 

Show  Vg  C  Vg. 

Similar  to  (1)  except  replace  the  s  set  by  the  d  set. 


□ 


Theorem  4-6.7: 

Let  K'  €  K[V,>xVg],  K‘  =  <C‘>,  and  K*  G  K[V,2xVg),  K*  =  <C2>, 
be  two  networks.  If  VCj^GC'  3  C*  G  C*  3:  =  C*,  then 

K‘  Cc  K*. 


**•  *•*  *•*  "  J*  *  ^  *•*  *. “*»“*v*"«**  «“*•“*«**•*”•*  •*  •*  •*  *  •*  •*  • 


38 


Proof: 

Show  V,*  C  V,*  and  C  Vg. 

The  proofs  are  similar  to  proof  of  Theorem  4.6.6. 

□ 

Theorem  4.6.8: 

Let  K‘  €  KfV/xV^],  K‘  =  <C*>,  and  K*  G  K[V,2xVg],  K*  =  <C*>, 
be  two  networks.  If  =  C*  then  K*  =  K*. 

Proof: 

(1) :  Show  V/  =  V,2. 

C»  =  C*  s(C‘)  =  s(C2)  -¥  V,‘  =  V,2. 

(2) :  Show  =  Vg. 

C*  =  C*  -♦  d(C‘)  =  d(C2)  =  V^. 

□ 

The  following  examples  show  an  application  of  the  similarity  measure 
between  two  networks.  Note  the  increasing  similarity  between  the  PM2I  and 
k’dimensional  Illiac  as  the  dimension  k  increases.  The  examples  are  generalized 
into  a  theorem  showing  what  happens  at  the  limit  of  k  as  k  increases  to 
maximum. 

Exampie  4-6.9: 

Consider  the  Illiac  network  with  N  =  64  processors.  The  network  can 
be  modeled  as  follows. 

V,'  =  (u,|i=0,l,...83},Vo' =  {V|,|  1=0,1,. .63). 

C  =  {C,',C,‘,  Cj'.Cj'). 


Let  ®  denote  addition  modulo  64  and  @  subtraction  modulo  64. 


Co*  =  {<U],Vj^j>|i  =0,1, ...63}, 

C,‘  =  {<Uj,Vjei>|j  =0,1,...63}, 

Cj*  =  {<«j.Vj®8>|j  =0,1,...63}, 

C3*  =  {<Uj,VjQ8>|  j  =  0,1,...63}. 

Then  K*  =  <C*>  describes  the  network. 

Consider  the  single  stage  PM2I  network  with  N  =  64  processors.  The 
network  can  be  described  as  follows. 

V,2=  {Uj|j=0,l,...63},Vo2=  {vj  k=0,l,...63}. 

C^  =  {Co^,C|^,...Cn^}, 

I  3  =  0,1,...5;  j  =  0,1, ...63}. 

C^j+j  —  I  ^  ~  0,1, ...5j  j  ~  0,1, ...63}. 

Then  =  <C^>  describes  the  network. 

What  is  the  relationship  between  the  networks. 

Solution: 

(a) :  V,‘  C  V,2,  Vo*  C  Vo^, 

(b) :  VCp*  €  C‘  3  C„2  G  C^B.  Cp*  =  C„2.  By  Theorem  4.6.7  K*  is 

a  subnetwork  of  type  c  of  K*,  denoted  by  K*  Cc  K*.  Since 
Cj^  ^  C',  therefore  K*  /  K^.  In  the  special  case  of  N  =  4  the 
Illiac  is  equal  to  the  PM2I. 

Example  4-6.10: 

Consider  the  generalized  three  dimensional  Illiac  system  with  64 
processors,  arranged  as  a  4x4x4  matrix.  This  network  can  be 
modeled  as  follows. 

Vi*  =  {uj|j=0,l,...63},  Vo*  =  {Vk|k=0,l,...63}. 


C  =  {CVC'i.C'j,  C'„C'4.C'5). 

Let  ®  deoote  additioD  modulo  64  and  Q  subtraction  modulo  64. 
cj  =  {<Ui.Vj®4.>|  a  =0,1,2;  j  =  0,1,...63}, 

C‘3+b  =  {<«j.Vj©|k>|  b  =  0,1,2;  j  =  0,1,...63}. 

Then  K*  =  <C‘>  describes  the  network. 

Let  =  <C*>  be  the  PM2I  network  with  N  =  64  as  in  Example 
4.6.0. 

What  is  the  relationship  between  the  networks. 

Solution: 

(a) :  V,‘  C  V,2,  Vo*  C  Vo^ 

(b) :  VCp*  6  C*  3  C„2  £  0*3:  Cp*  =  C„2.  By  Theorem  4.6.7  K*  is 

a  subnetwork  of  type  c  of  K*,  denoted  by  K*  Cc  K*.  Since 
C,2  C*,  therefore  K*  K*. 

Example  ^.6.11: 

Consider  the  generalized  six  dimensional  Illiac  system  with  64 
processors,  arranged  asa2x2x2x2x2x2  matrix.  This  network 
can  be  modeled  as  follows. 
v,'=  {Uj|j=0,l,...63),Vo' =  (Vkl  k=0,I,...63). 

C.‘  =  (C'o,C‘„...C‘„>. 

Let  0  denote  addition  modulo  64  and  0  subtraction  modulo  64. 

C»*  =  {<Uj,Vj02->|  a  =  0,1,2,3,4,5;  j  =  0,1,...63}, 

C*«+b  =  {<“j.VjQ!‘->|  b  =  0,1,2, 3,4,5;  j  =  0,1,...63}. 

Then  K*  =  <C*>  describes  the  network. 

Let  K*  =  <C*>  be  the  PM2I  network  with  N  =  64  as  in  Example 


4.6.0. 


30 


What  is  the  relationship  between  the  networks. 

Solution: 

C*  =  C*,  and  Theorem  4.6.8  imply  K*  =  K*. 

Theorem  4-6.12: 

Let  there  be  K*  =  <C‘>  a  PM2I  network  with  N  =  2®  processors, 
then  there  exist  K*  =  <C*>  a  generalized  Illiac  network  in  k 
dimensions,  such  that  K*  Cc  K^;  moreover  there  exists  k  =  m 
dimension  such  that  K*  =  K^.  Consequently  PM2I  can  be  viewed  as  a 
limiting  case  of  a  k-dimensional  Illiac  network. 

Proof: 

(1):  Consider  the  generalized  k-dimensional  Illiac  system  with  64 

processors,  arranged  as  2x2x2x2x2x2  matrix.  This 
network  can  be  modeled  as  follows. 

V,‘  =  {Uj|j=0,l.  .63>,Vo'  =  {Vkl  k=0,l,...63}. 

C.'  =  {C‘,C',,  •  •  •  C'„>. 

Let  0  denote  addition  modulo  64  and  0  subtraction  modulo  64. 

j. 

Let  d=N'', 

C,*  =  {<Uj,v.ej.>|  a  =  0,l,...k  -  1;  j  =  0,1,...N  -  1}, 

C^a+b  =  b  =0,l,...k  =l;j  =  0,1,...N  -  1}. 

Then  K*  =  <C*>  describes  the  network. 

Consider  the  single  stage  PM21  network  with  N  processors.  The 
network  can  be  described  as  follows. 

V|>=  {aj|j=0,l,...N-l),  Vo*=  K|  k=0,l,...N  -  1). 

C«  =  {CV  C» . CVi). 


t'J 


W  -  {<«j.Vj®2*>  I  ^  =  0,1, ...2in  -  1;  j  =  0,1,...N  -  1}. 

C*m+t  =  {<«j.  VjQj.> 
j  t  =  m,ni  +  l,...2in  -  1;  j  =  0,1,.. .N  -  1}. 

Then  K*  =  <C*>  describes  the  network. 

m 

N=2"'  and  N  =  d‘‘  2™  =  d^  d  =  2  , 

m 

->  Y  =  -♦d  =  2  -►d  =  2,4, ...N. 

(1),  (2)  -♦  Cl  QCl 

Cl  CCf  and  Cl  CC^  -♦  V  Cp*  G  C‘  3  G  C*3: 
Cp*  =  C„2 

By  Theorem  4.6.7  K*  Cc  K*. 

(1),  (2),  and  d  =  2  Cj  =  C* 

Cl  =  C*  and  Cb*  =  C^  -♦  C‘  =  C*. 

C*  =  C*  and  Theorem  4.6.8  imply  K'  =  K*. 


4.7  Systems  and  Subsystems 


In  this  section  the  problem  of  modeling  systems  and  subsystems  and 
analysis  of  different  relationships  between  systems  is  being  discussed.  A 
system,  informally,  consists  of  a  set  of  devices,  an  interconnection  network,  and 
a  method  of  use  of  the  network.  A  typical  device  can  be  a  processor,  memory, 
or  a  processor/memory  pair.  Each  device  has  two  logical  ports,  input  and 


output,  possibly  physically  implemented  as  the  same  set  of  physical  I/O  pins. 


Three  types  of  systems  are  recognized  based  upon  the  method  of  use. 
Broadly  speaking,  a  device  can  use  the  network  in  two  basically  different  ways. 
A  device  can  have  its  output  connected  to  the  input  of  the  network  and  its 
input  port  connected  to  the  output  of  the  network.  If  this  holds  for  all  the 
devices  in  the  system  then  this  method  will  result  in  a  recirculating  system. 
From  a  communication  point  of  view,  these  paths  from  the  output  of  the 
network  through  the  devices  back  to  the  input  of  the  network  can  be  used  to 
generate  different  connection  patterns  using  multiple  passes  through  the 
network.  Alternatively,  a  system  could  be  constructed  where  there  b  a  device 
such  that  a  device  where  only  the  device’s  output  b  connected  to  the  network, 
but  its  input  b  from  outside  of  the  system.  If  thb  holds  for  all  the  devices  in 
the  system  than  thb  configuration  will  result  in  a  nonrecirculating  system. 
Thb  can  be  used  to  model  systems  such  as  a  real  time  digital  signal  processing 
systems.  A  real  time  digital  signal  processing  systems  typically  consbt  of 
several  functional  sets  of  (one  or  more)  processors,  each  set  optimized  to 
perform  a  class  of  operations,  together  with  an  interconnection  network 
between  each  pair  of  functional  sets.  From  the  communication  point  of  view, 
these  systems  can  not  generate  different  connection  patterns  using  multiple 
passes,  because  the  paths  from  the  outputs  of  the  network  to  the  network 
inputs  through  the  devices  are  missing.  Hybrid  systems  consisting  of  some  (but 
not  all)  devices  having  return  paths  are  also  possible,  for  example  a  binary  tree 
type  networks,  where  the  links  are  unidirectional  and  one  of  the  leaves  b 
connected  to  the  root  device. 

Several  relationships  between  two  systems  can  bold,  the  systems  can  be 
completely  different,  they  can  be  equal  or  they  can  have  some  degree  of 


42 


similarity.  The  relatioDship  of  subsystems  is  discussed  here.  Informally,  if 
there  is  a  system  over  V|*  x  Vq^,  and  another  system  over  Vj^  x  Vq*,  and 
Vj*  C  Vj^,  Vq^  C  Vq^,  then  it  is  possible  that  the  systems  can  be  ordered  in 
some  sense.  The  ordering  considered  here  is  the  subsystem  relationship  defined 
later.  The  idea  is  this.  If  both  systems  have  the  same  method  of  use  of  the 
network,  and  additionally  the  network  of  one  system  contains  some  or  all  the 
states  of  the  network  of  the  other  system,  then  the  second  system  is  in  some 
sense  a  similar  to  the  first.  Using  this  concept,  three  different  types  of 
subsystems  are  defined  and  some  examples  presented. 

Definition 

Let  K  G  K[Vi  x  VqI,  K  =  <C>  be  a  network.  If  the  usage  of  the 
network  is  such  that  data  outputted  at  v^GVo  can  be  fed  back  in  Vj,gV[, 
then  <Vj„Vy>GCF.  Cp  is  called  feedback  eorreapondenee. 

Physical  implications:  This  describes  the  situation  where  a  processor  or  any 
other  device  is  connected  to  both  Vy  and  v,.  The  device  inputs  data  into  Vj^GVj 
and  receives  data  at  VyGVo-  Thus  if  <v,j,Vy>GCp  then  the  same  device  is 
attached  to  v,  and  Vy.  If  <v,5,Vy>^Cp  then  a  separate  device  is  attached  to 
each  of  v,  and  Vy.  Since  it  is  assumed  that  each  device  has  only  one  input  and 
one  output,  and  that  a  vertex  can  have  at  most  one  device  connected  to  it,  the 
Cp  has  the  following  properties: 

(a)  if  <v,,Vy>,  <v,„Vy>GCp  then  v,^  =  v^; 

(b)  if  <v,,Vy>,  <v,,v,>GCp  then  Vy  =  v,; 

(c)  Cp  C  V,xVo. 


Theorem 

Cp  is  a  map,  1:1,  onto  from  X  to  Y,  where  XCV|  and  YCVq. 

PfooJ : 

Obvious  by  definition  of  Op  and  properties  (a),  (b),  (c). 

□ 

Definition  4-7.5; 

Let  K  €  K[V|  x  Vq],  K  =  <C>  =  <{C„}>  be  a  network,  and  let  Cp 
be  a  feedback  correspondence,  CpCV|xVo,  then 
S  =  <C,Cp>  =  Cp>  is  called  the  eyatem  over  Vj  x  Vq. 

Physical  implications:  The  Cp  precisely  describes  the  usage  of  a  network  in  a 
system.  If  s(Cp)  =  Vj  and  d(Cp)  =  Vo,  then  the  system  is  fully  recirculating. 
If  Cp  ^  0  and  either  s(Cp)  Vj  or  d(Cp)  /  Vq  (or  both),  then  the  system  is 
partially  recirculating.  If  Cp  =  0  then  the  system  is  nonrecircutating.  An 
example  of  a  system  is  given  in  Figure  4.2.  The  properties  of  Cp  have 
implications  on  whether  the  system  can  generate  different  correspondences  by 
using  multiple  passes  through  the  network.  Multiple  passes  require  that 
Cp  ^  0,  that  is  the  system  must  be  partially  or  fully  recirculating.  At  the  end 
of  this  section  examples  of  each  type  of  the  system  are  presented  in  detail. 

Definition  4.7.4’ 

The  set  {S  |  S  =  <C,  Cp>  is  a  system  over  Vj  x  Vq}  is  called  the  S- 
set  over  V}  x  Vo  denoted  by  S[V|  x  Vq). 


Definition  ^.7.5; 

Let  S*  =  <C*,  C^>  and  S*  =  <C*  C^>  be  two  systems.  If  (1) 
Vi‘  =  V,*,  =  V3;  (2)  =  Cl  and  (3)  C*  =  C*  then  5*  is  equal  to 

5®.  Notation:  S*  =  S®. 

Pkyeieal  implication:  S'  and  S®  are  completely  interchangeable. 

Theorem  4-  7. 6: 

Sufficiency  condition  for  equality  of  systems.  If  (3)  holds  in  Definition 
4.7.5  then  (1)  holds. 

Proof: 

(a) :  Show:  (3)  Vi*  =  V,®. 

V,‘  =  s(C')  =  s(C®)  =  V,®. 

(b) :  Sho«:(3)-»  Vi  =VS. 

Vi  =  <i(C')  =  d(C*)  =  Vi. 

□ 

The  implication  of  this  theorem  is  that  to  check  two  systems  for  equality  it  b 
only  necessary  to  examine  Cp  and  C. 

In  the  following  part,  the  definitions  of  different  categories  of  relationship 
between  two  systems  are  formally  given.  Note  that  the  similarity  relationship 
here  b  an  extensions  of  the  relationship  between  networks  (Section  4.6)  that 
include  the  comparbon  of  the  feedback  correspondences.  To  facilitate  the 
understanding  of  the  material,  it  b  presented  as  follows.  The  categories  of  the 
similarity  relationship  are  presented  in  the  order  of  increasing  strictness. 
Immediately  after  each  formal  definition,  an  example  b  presented. 


>3 


w 


I 


/• 

N'I'-l 


'.s' 


!&] 


■A  .• 


Hi 


m  Jw  ^ 


Definition  4-  7. 7: 


Let  S*  =  <C*,  C^>  and  S*  =  <C*  C^>  be  two  systems.  If 

(1)  V,*  C  V,2  and  C  V^; 

(2)  -  C^l  (v,»  X  Vo*)  U  (V|*  X  Vo*)’ 

(3)  VCi  €  C‘  3  C2  €  C2  3c  C*  C  C*  U 

then  5^  is  eubayetem  type  a  of  5^.  (“Be”  means  “such  that”)  Notation: 
S‘  Ca  S* 

Example  of  subsystem  type  a. 


Let 

S*  =  <C^,  C^>  be  a  system. 

=  {vo,  vi,  vj}, 

VS  =  {uo,  Ui,  uj}, 

=  {<Vo,  Uo>,<V„  Ui>,<V2,  U2>}, 

C*  =  {Co’,  C?,  C|}, 

Co  =  {<Vo.  Uo>.<Vo.  Ul>.<V2.  “2>). 
C?  =  {<Vi,  Ui>,<Vi,  U2>}, 

C|  =  {<V2,  U,>,<V2,  U2>}. 


Let 

Vi*  =  (vo,  V,}, 

Vi  =  {uo,  u,}, 

C^  =  (<Vo,  Uo>,<V,,  Ui>}, 

C‘  =  {Co*.  C/}, 

Cq  =  {<Vo,  «0>.<Vl.  «!>}. 
Cl  =  {<vo,  U,>}. 


Then 

(1)  <C*,  C/>  is  a  system  (denoted  S*). 

(2)  (a)  V,'  c  vf,  Vi  g  Vi 

(b)  Ci  -  C|  I  jy  ,  ^  ^ 


(c)  Co*  C  C  C|  U  ,  Cl  C  Co*  C  Co*  U  C^ 
-►S*  CaS*. 

Definition  J^.7.8: 

Let  S*  =  <C*,  Cf>  and  S*  =  <C*  C|>  be  two  systems.  If 

(1)  V,»  C  V,2,  C  Va  ; 

(2)  C^  -  C^  I  ^  u  ^ 

(3)  \/Ci€C»  3C*€C*  3cC»  CC* 

then  5*  is  subsystem  type  b  of  Notation:  S*  Cb  S*. 

Example  of  subsystem  type  6. 


Let 

S*  =  <C*,  C|>  be  a  system. 

Vi*  =  {vo,  V|,  vj}, 

Va  =  {uo,  u„  uj}, 

Cf  =  {<^0*  «0>.<V|.  «I>.<V2.  «2>}. 
C*  =  {Co*,  C,*,  C|}, 

Co  =  {<'^0.  «0>.<Vo.  Ul>.<V2.  «2>}. 
c*  =  {<V|,  U,>,<y,,  U2>}, 
c|  =  {<V2,  U,>,<V2,  U2>}. 


Let 

Vi*  =  {vo,  V|}, 

va  =  {uo,  u,}, 

Cp  =  {<Vo,  “!>} 

c*  =  (Co*,  c*}, 

Co*  =  {<Vo,  Uo>}, 

C*  =  {<Vo,  «0>.<'^0.  “!>} 


Then 


(1)  <C*,  C^>  is  a  system  (denoted  S*). 

(2)  (a)  V}  C  v;,  VA  C  Vg 


(b)  -  Cf  I  ^  ,  ^  ^ 

(c)  Co‘  C  Co*  ,  Cl*  C  Co*  -►  s*  Cb  S*. 


Definition  J^.l.Q: 

Let  S*  =  <C*,  C^>  and  S*  =  <C*,  C^>  be  two  systems.  If 

(1)  V,*  C  V,*,  C  VS  ; 

(2)  C^  =  C^l  ^  JJ  ^  y^,j; 

(3)  vc^ec*  3  C*€C*3:C^  =C* 

then  5*  it  subsystem  type  e  of  5^.  Notation:  S*  Cc  S*. 

Example  of  subsystem  type  e. 

Let 

S*  =  <C*,  Cf>  be  a  system. 

Vi*  =  {vo,  v„  vj}, 

V3  =  {uo,  u„  uj}, 

=  {<'^0.  «i>>. 

C*  =  {Co*,  C,*  C|}, 

<^0  =  {<^0.  Uo>.<'^0.  «I>.<V2.  «2>}. 

c*  =  {<Vi,  U,>,<v„  U2>}, 

C*  =  {<'^2.  Ul>.<V2,  «2>}- 

Then 

(1)  <C*,  C^>  is  a  system  (denoted  S*). 

(2)  (a)  V,'  c  v;,  Vi  c  va 


Let 

Vi*  =  {vi,  vj}, 

Vi  =  (ui,  Uj}, 

=  {<V|,  U|>}, 

C*  =  (Co*,  Co*}, 

Co*  =  {<Vi,  Ui>,<V,,  U2>}, 
C|*  =  {<V2,  Ui>,<V2,  U2>}. 


The  Theorems  4.7.10  to  4.7.14  discuss  the  suflBcieucy  conditions  for  the 
different  relationships  between  systems  to  hold. 

Theorem  4-7.10: 

Sufficiency  condition  for  subsystem  type  a. 

If  (2)  and  (3)  hold  in  Definition  4.7.7  then  (1)  holds. 

Proof: 

(a) ;  Show:  (2),  (3)  -►  V,»  C  Vf. 

S*  Ca  S2  VC^€C*  9  3:  Cj,  C  U  C|. 

V,‘  =  s(C‘)  =  s(U  Ci)  C  s(U  (C*  UC^)) 

m  m 

-»  S(U  (C»  u  c?))  Q  s(U  (C»  U  Cfn  Wi  a  C»  €  c» 

m  n 

-►  S(U(C«  u  c^!))  =  S(U  c?)  u  s(Cr*)  =  V? 

n  n 

v,‘  C  V,2. 

(b) :  Show:  (2)  and  (3)  -♦  V^C  V§. 

Similar  to  (a),  with  s(C*)  and  s(C^)  replaced  by  d(C*)  and  d{C*) 
respectively. 

□ 

Theorem  4-7.11: 

Sufficiency  condition  for  subsystem  type  b. 

If  (3)  holds  in  Definition  4.7.8  then  (1)  holds. 


Analogous  to  proof  of  Theorem  4.7.10  (note  that  (2)  is  not  needed  since 
Cf  b  not  part  of  (3)). 

□ 


Theorem  4-7.12: 

Sufficiency  condition  for  subsystem  type  c. 

If  (3)  holds  in  Definition  4.7.0  then  (1)  holds. 

Proof: 

Analogous  to  proof  of  Theorem  4.7.11. 

□ 


Theorem  4-7.18: 

Let  =  <C*,  C/>  and  S*  =  <C*,  Cp>  be  two  systems. 

(1)  If  S*  =  S*  then  S*  Cc  S^. 

(2)  If  S‘  Cc  S*  then  S‘  Cb  S*. 

(3)  If  S'  Cb  S*  then  S'  Ca  S*. 

Proof: 


Obvious,  follows  from  definitions  of  subsystems. 


Theorem  4‘TH: 

Let  S*  =  <C*,  C/>  and  S*  =  <C*,  C^>  be  two  systems.  If 

(1)  S*  Cc  S*  and 

(2)  S*  Cc  S‘,  then  S»  =  S* 

Proof: 

Show: 

(1) :  V,  =  V^,.  v‘o  =  n; 

(2) :  C/  =  C|;  and 

(3) :  C‘  =  C*. 

(a) :  Show:  V*,  =  V^,,  V‘o  =  V^q. 

From  Theorem  4.7,12  it  is  known  that  (3)  “♦(!),  so  only 
(2)  and  (3)  have  to  be  shown. 

(b) :  Show:  C/  =  Cf 

S'  Cc  S*  C  Cl  S*  Cc  S*  C 

-*  =  C^. 

(c) :  Show:  C*  =  C*. 

\/Ci  G  C‘  3  unique  C*  G  C*  3c  =  C* 

Similarly  V  C*  G  C*  3  unique  CjJ  G  C  3c  C|  =  Cp 
-♦  C'  =  C*. 

(d) :  C'f  =  CV,  C*  =  C*  <C',C'f>  =  <C\C\>. 

□ 

In  the  following  part,  detail  examples  of  the  three  types  of  systems:  fully 
recirculating,  partially  recirculating,  and  nonrecirculating  are  given. 


Example  4-7.15: 


Consider  the  following  system; 

Vi  =  {uo,  u„  uj,  uj),  Vo  =  {v^  v^,  v*.  vj>, 

Cl  =  {<Uo,v,>,<U|,Vb>,  <a2,v^>,  <us,vj>}, 
C2  =  {<Uo,Ve>,<U„Vd>,  <U2,V^>,<U3,Vfc>}, 
Cf  =  {<Uo,Vj>,  <Ui,V.>,  <U2,Vi,>,  <U3,Ve>), 

s  €  s  ({uo,  Uj,  U2,  U3}  X  {v^,  Vb,  v^,  Vj}], 

S  =  ■<Ci,Cf>  =  <{^1,02},  Cf>. 


Find  the  type  of  the  system. 


Solution: 


Based  on  the  Cp,  the  system  U  a  fully  recirculating  system.  In 
particular,  the  system  is  isomorphic  to  a  bidirectional  ring 
S  =  <{R  +  i,R_i},  identity  map  >.  See  Figure  4.3. 


Example  4-7-15: 


Consider  the  following  system; 

Vi  =  {uo,u„U2,U3},  Vq  =  {v„Vb,v^,Vd}, 

Cl  =  {<Uo,V.>,  <Ui,Vb>,  <U2,Vg>,  <U3,Vj>}, 
C2  =  {<Uo,Vc>,  <Ui,Vj>,  <U2,V.>,<U3,Vb>}, 
Cf={<Ui,V,>,  <U2,Vb>,<U3,V,.>}, 

s  €  s  ({uo,  Ui,  U2,  U3}  X  {v„  Vb,  V,.,  Vd)I, 

S  =  <Ci,Cf>  =  <{Ci,C2},  Cf>.. 

Find  the  type  of  the  system. 


Solution: 


Based  on  the  Cp,  the  system  b  a  partially  recirculating  system.  In 
particular,  the  system  b  bomorphic  to  a  reconfigurable  pipeline  with  C| 
for  algorithm  1  and  C2  for  algorithm  2.  See  Figure  4.4. 

Example  4.7.17: 

Consider  the  following  system: 

Vi  =  {Uo,U„U2,U3},  Vo  =  {v„Vb,V,.,Vj}, 

Cl  =  {<Uo,V,^>,<Ui,Vb>,<U2,V^>,<U3,Vd>}, 

C2  =  {<Uo,V,>,  <U„Vj>,<U2,V^>,  <U3,Vb>}, 

Cp  —  0, 

s  €  s  ({Uo,  u„  U2,  U3}  X  {v„  Vb,  v„  Vj}], 

S  =  <Ci,Cp>  =  <  {Cl,  C2},  Cp>. 

Find  the  type  of  the  system. 

Solution: 

Based  on  the  Cp,  the  system  b  a  nonrecirculating  system.  In  particular, 
the  system  b  isomorphic  to  a  distributed  signal  processing  system.  See 
Figure  4.5. 

Example  4-7.18: 

Consider  a  system  with  three  processor/memory  pairs,  where  each 
processor  has  a  single  physical  port.  The  processors  communicate  via  a 
shared  bus.  The  physical  port  can  reconfigured  as  either  a  logical  input 
or  a  logical  output  port. 

Construct  a  model  of  this  system. 


57 


Solution: 

(1) :  Denote  the  input  and  output  label  sets  by  Vj  =  {uo,U|,U2}  and 

~  device  dj  has  its  output  port  connected  to 

the  input  label  uj  and  has  its  input  port  connected  to  the  output 
label  Vj,  i  =0,1,2. 

(2) :  Based  upon  the  given  information,  the  feedback  correspondence  b 

Cf  =  {<Ui,Vi>|  i=0,l,2}. 

(3) :  The  states  of  the  network  are  as  follows. 

A(k)  =  {<Uk,Vj>,  I  j,k  =  0,1,2;  k/ j}. 

Let  A  =  {A(k)  I  k  =  0,1,2},  (set  of  all  1:1  connections). 

Bi,j(k)  =  {<Uk,Vi>,<Uk,Vj>  I  i,j,k  =  0,1,2;  k  i;  k9«j;  i#j}. 
Let  B  =  {Bj  j(k)  I  k  =  0,1,2},  (set  of  all  two  way  broadcasts). 

(4) :  The  model  then  will  be  as  follows. 

C  =  AUB,  Cp  =  {<«i,Vi>|  i=0,l,2},  and  S  =  <C,Cf>. 
Example  4^7.19: 

Consider  a  system  consbting  of  the  Illiac  network  with  N  =  64 
processors  as  in  Example  4.6.9.  The  network  b  used  in  a  fully 
recirculating  system.  The  system  can  be  modeled  as  follows. 

V,‘  =  {Ui|j=0,l,...63},  Vo'  =  {vv|k=0,l,...63}. 

C*  =  {Co‘,Ci*,  clcj}. 

Let  0  denote  addition  modulo  64  and  0  subtraction  modulo  64. 

Co*  =  {<Uj,Vj®,>|j  =0,1,...63}, 

C,'  =  {<Uj,Vje,>|j  =0,1,...63}, 

Cj*  =  {<Uj,v.®g>|j  =0,1,...63}, 

C3*  =  {<Uj,VjQ8>|  j  =  0,1,...63}. 


HI  "jijm  V  r  #-' V  fy.rwk?  y '  w  yj  yj  w  y v  r  ,■  r j  jw  rv  wr  y\>  w  a  w  m  »  «  ■  b  ^ •  F  ET*:*  vi«n^r w^ww» 


Cf  =  {<Ui,Vi>|  i  =  0,1,...N  -  1}. 

Then  S*  =  <C*,  C*f>  describes  the  system. 

Consider  the  single  stage  PM2I  network  with  N  =  64  processors  as  in 
Example  4.6.0.  The  network  is  used  in  a  nonrecirculating  system.  The 
system  can  be  described  as  follows. 

V,2=  {uj|j=0,l,...63},Vo2=  {Vkl  k=0,l,..63}. 

C*  =  {C(),C|,...C||}, 

C^g  —  {^Uj,v.02»^  I  s  “  0,1,. ..5j  j  —  0,1, ...63}. 

C^+t  =  {<Uj,VjQj.>  1 1  =  0,1,...5;  j  =  0,1,...63}. 

C^F  =  0. 

Then  S*  =  <C*,  C^f>  describes  the  system. 


What  is  the  relationship  between  the  two  systems. 


Solution: 


From  Example  4.6.0  it  was  found  that  the  Illiac  b  a  subnetwork  of  type 
c  of  PM2I,  but  it  would  not  be  correct  to  conclude  that  S*  Cc  S*  since 
(C^  /  C^l  ^  ^  For  example,  the  S*  b  capable  of 


executing 


interconnection 


function 


A*  — 


{<Uj,Vj^3>|  i  =  0,1,...N  - 1}  using  multiple  passes  through  the 
network.  b  not  capable  of  executing  the  function  A3  because  it  b  a 
nonrecirculating  system  and  therefore  not  capable  of  multiple  passes 
through  the  network. 

It  b  important  therefore,  to  consider  the  feedback  connections  when 
evaluating  the  relationships  between  systems.  Therefore  one  must  conclude, 
that  the  comparbon  between  systems  is  not  possible  and  does  not  make  sense  if 
the  systems  use  their  respective  networks  differently. 


V  ^  ■.  .V  s  • 


4.8  Conclusions 


In  this  chapter  the  following  problems  were  addressed.  A  general  model, 
implementation  independent,  for  modeling  of  topologically  arbitrary 
interconnection  networks  was  developed.  Several  important  relationships 
between  networks  were  rigidly  formulated  such  as  equality  and  subnetworks. 
A  similarity  relationship  between  networks  was  defined.  The  relationship  has 
the  following  categories  in  the  order  of  increasing  strictness:  (a)  networks  are 
equal,  (b)  K*  is  subnetwork  of  type  c  of  K^,  (c)  K*  is  subnetwork  of  type  b  of 
K^,  and  (d)  none  of  the  above.  Note  that  this  is  an  extension  of  the  previously 
used  method  which  categorizes  networks  into  two  classes  only:  isomorphic  and 
nonisomorphic. 

A  system  and  different  types  of  subsystems  were  defined.  A  system 
informally,  consists  of  a  set  of  devices,  an  interconnection  network  and  the 
method  of  use  of  the  network  by  the  devices.  Three  different  types  of  systems 
were  defined,  based  upon  the  method  of  use  of  the  network  and  several 
relationships  between  two  systems  were  analyzed.  The  systems  types 
recognized  are  recirculating,  nonrecirculating,  and  partially  recirculating.  In  a 
recirculating  system  each  device  d|  has  its  logical  output  port  connected  to  an 
input  label  of  the  interconnection  network  and  its  input  port  connected  to  an 
output  label  of  the  interconnection  network.  For  a  fully  recirculating  system 
|Vl|  =  |Vo|.  A  partially  recirculating  system  contains  some,  but  not  all, 
devices  each  of  which  has  its  output  port  connected  to  the  network  input  label 
and  its  input  port  connected  to  an  output  label  of  the  network.  If 
I  Vi|  ^  I  VqI  ,  than  the  system  cannot  be  recirculating  and  can  be  only 


partially  recirculating  or  nonrecirculating,  because  each  device  has  only  one 
input  port  and  one  output  port.  In  a  fully  or  partially  recirculating  it  is 
possible  to  generate  different  connection  patterns  using  multiple  passes  through 
the  network.  In  a  nonrecirculating  system,  each  device  b  connected  only  to  the 
network  input  or  (exclusive)  to  a  network  output.  Thb  type  of  configuration 
appears  frequently  in  real  time  diptal  signal  processing  systems.  The  result  of 
thb  configuration  b  that  no  new  connection  patterns  can  be  achieved  by 
multiple  passes,  since  it  b  not  possible  to  move  the  data  from  the  output  of  the 
network  back  to  its  input. 

A  similarity  relationship  between  systems  was  defined.  It  b  an  extension 
of  classification  of  networks  which  takes  into  the  consideration  the  Cp 
properties.  The  relationship  has  the  following  categories  in  the  order  of 
increasing  strictness:  (a)  systems  are  equal,  (b)  S*  b  subsystem  of  type  c  of  S^, 
(c)  S*  b  subsystem  of  type  b  of  S*,  (d)  S*  b  subsystem  of  type  a  of  S*,  and  (e) 
none  of  the  above.  Note  that  thb  b  an  extension  of  the  previously  used 
method  which  categorizes  systems  into  two  classes  only:  bomorphic  and 
nonbomorphic. 


6.1  Introdaetlon 


( 

I 

I 

I 

I  In  Chapter  4  a  restricted  problem  of  a  measure  of  similarity  between  two 

I  systems  was  studied.  It  was  assumed  that  given  a  system  S*  over  V]*  x  Vq* 

and  S*  over  V,*  x  Vq*,  the  labeling  is  such  that  V,*  C  V,*  and  C  V^.  If 
above  does  not  hold  that  does  not  mean  that  the  two  systems  are  dissimilar,  it 
just  could  mean  that  the  Vi  and  Vq  labeling  is  not  helpful. 

To  study  the  problem  of  comparison  of  randomly  labeled  systems  the 
concept  of  quasimorphism  of  systems  was  developed  [SeS84a].  This  concept  is 
related  to  the  classification  of  groups  in  the  field  of  abstract  algebra  and  group 
theory.  The  theory  of  group  classification  is  based  upon  the  concept  of 
morphism.  Morphism  is  a  measure  of  similarity  of  behaviors  of  group 
operations  of  two  groups.  This  measure  ignores  the  labeling  of  the  elements  of 
the  groups  and  is  concerned  strictly  with  the  structure  which  is  determined  by 
the  group  operation. 

In  the  domain  of  parallel  computer  systems  the  structure  of  interest  is  the 
structure  of  the  correspondences  of  the  system’s  network  in  the  graph 
j  theoretical  sense.  The  quasimorphism  of  systems  allows  a  method  of 

comparison  of  randomly  labeled,  topologically  arbitrary  parallel  computer 
systems.  The  quasimorphism  facilitates  the  analysis  of  following  problems  in 
parallel  processing: 


(a)  system  A  emulating  system  B  (three  different  degrees  of  strictness  of 
emulation  are  discussed); 


(b)  fault  tolerauce/reliability  (achieved  by  multiple  mapping  of  same 
problem  into  a  system); 

(c)  partitioning  of  a  system. 

The  quasimorphism  is  analyzed  with  respect  to  properties  similar  to  the 
properties  of  reflexivity,  symmetry,  and  transitivity.  Several  examples  of 
quasimorphism  of  different  types  are  presented. 

Also  in  this  chapter  the  problem  of  emulation  of  one  system  by  another  is 
discussed.  Three  different  types  of  emulation  are  considered.  Several  measures 
of  efficiency  of  emulation  are  defined  and  the  three  types  of  emulation  are 
evaluated  using  these  criteria.  Two  examples  of  emulation  of  arbitrary  systems 
are  presented. 


5.2  Overview 

In  Section  5.3  the  problems  discussed  in  this  chapter  are  defined.  In 
Section  5.4,  the  previous  related  work  is  described.  In  Section  5.5  the  basic 
definitions  and  concepts  are  given.  In  Section  5.6  the  measure  of  similarity  of 
systems  called  quasimorphism  is  developed.  In  Section  5.7  an  application  of 
quasimorphism  in  emulation  of  one  system  by  another  is  researched.  In  Section 
5.8  the  conclusions  of  this  chapter  are  given. 


5.S  Problem  Statement 


It  is  intuitively  obvious  that  some  systems  have  different  topologies  than 
others,  yet  not  much  has  been  done  in  the  past  research  to  quantify  the 
differences.  In  the  past,  researchers  used  only  two  wide  categories,  two 
networks  are  isomorphic  or  two  networks  are  not  isomorphic.  In  this  chapter  a 
refinment  of  the  measure  of  similarity  between  two  systems  is  explored.  In  the 
domain  of  parallel  computer  systems  the  structure  of  interest  is  the  structure  of 
the  correspondences  of  the  system’s  network  in  the  graph  theoretical  sense. 
The  dynamic  behavior  of  the  system’s  reconfigurable  network  which  generate  a 
set  of  correspondences  as  a  function  of  the  control  strategy,  must  be  taken  into 
the  consideration.  Based  upon  the  idea  of  morphism  of  groups,  the  concept  of 
morphism  of  parallel  computer  system  topology  is  developed.  This  measure  is 
called  quasimorphism  and  is  based  upon  the  concept  of  morphism  of  groups  in 
group  theory.  It  allows  a  comparison  of  topologically  arbitrary  parallel 
computer  systems.  The  measure  is  used  in  the  analysis  of  emulation  of  one 
system  by  another.  Three  different  types  emulation  are  defined  and  their 
properties  are  explored. 


65 


"•FF^TyTa 

aM 


6.4  Previous  Work 


The  following  are  the  three  research  areas  related  to  the  topics  explored  in 
this  chapter.  The  simple  (two  class  only)  similarity  measure  between  two 
networks  was  used  to  show  that  multi-stage  Shuffle-Exchange  network  is 
Isomorphic  to  the  n-Cube  [Law75).  The  emulation  definition  here  is  a 
generalized  form  of  the  definition  used  in  [FiF82]  to  study  quotient  networks. 
Another  work,  related  to  our  research  developed  here,  can  be  found  in  the 
classification  of  groups  in  the  field  of  the  abstract  algebra  and  group  theory 
[Han68,  Her75].  The  theory  of  group  classification  is  based  upon  the  concept  of 
morphism.  Morphism  is  a  measure  of  similarity  between  behaviors  of  two 
groups.  This  measure  ignores  the  labeling  of  the  elements  of  the  groups  and  b 
concerned  strictly  with  the  structure,  which  is  determined  by  the  group 
operation. 


6.6  Basic  Concepts 


The  analysts  of  relationships  between  systems  can  be  described 
mathematically  as  finding  correspondences  between  two  sets  of  systems,  or  S- 
sets.  This  problem  is  very  complex  to  handle  directly  and  therefore  it  will  be 
broken  into  two  major  parts.  Note  that  each  system  is  defined  over  an 
underlying  V]  x  Vq.  A  structure  called  T-element  will  be  defined  over  the 


same  underlying  V|  x  Vq  set.  The  T-element  has  less  constraints  than  a 
system  and  therefore  is  easier  analyze. 

The  first  major  part,  presented  in  this  section,  will  then  be  the  analysis  of 
relationships  between  T-elements.  The  major  part  will  in  turn  be  broken  into 
finding  relationships  between  the  underlying  substructures  of  the  T-elements. 
The  set  of  all  T-elements  over  particular  V|  x  Vq  is  called  the  T-set. 

The  second  major  part,  presented  in  Section  5.6,  will  consist  of  analysis  of 
relationships  between  two  S-sets.  Section  5.6  will  use  the  relationships  derived 
in  this  section  to  discuss  the  relationships  between  S-sets.  As  intended  some 
relationships  between  two  T-sets  will  be  directly  applicable  to  the  relationships 
between  two  S-sets,  and  some  others  will  be  applicable  in  somewhat  weakened 
form. 

To  resolve  the  ambiguity  in  the  notation  {<u^,U|,>},  assume  that  it  will 
indicate  a  set  of  pairs  unless  specifically  described  as  a  singleton.  Definition 
5.5.1  identifies  the  universe  of  discourse  for  this  section. 

Definition  5.5.1: 

Let  T(V,  X  Vol  ^  «{E„  I  ni=I,2,...n),  Ef>|  E„  e  P[V,  x  VqI, 
Ef  €  P[V|  X  Vq]}-  Then  T[V|  x  Vq)  is  called  the  T-set  over  Vj  x  Vq  . 

The  maps  0|  and  ^f>Q  are  the  basic  elements  in  the  discussion  of  the 
relationship  between  two  T-elements.  Since  the  analysis  is  very  complex,  some 
auxiliary  intermediate  maps  and  correspondences  are  defined  in  Definitions 
5.5.3  to  5.5.5.  For  a  pictorial  representation  of  the  genealogy  of  these  maps 
and  correspondences  see  Figure  5.1. 


v,>  X  — ^(v,*)  X  ^o(vi) 

t 

00  Vi‘  — V,2  — vg 


Figure  5.1: 

Genealogy  of  the  maps  aud  correspondences. 


Definition  5.5.2: 


Define  -map  and  <^q  -map  as  follows: 

01 :  V,*  -►  0i(V,‘),  map;  and  0o  =  0o(Vo).  map. 

Definition  5.5.S: 

Let  0| :  Vi‘  — ►  0i(Vi*),  be  a  map  and  0o  •  — ►  0o(Vo)*  *  m^P- 

Define  a  ^  ^/(^/*)  x  4>o(^o^) 

map  such  that: 

*.o  :  V,‘  X  Vo'  ^  <i,(V,')  X  (»o(Vo').  V  <v.,V|,>  e  V,'  X  Vo', 

^,0  (<v..»b>)  -  <^K),  #o(''b)>' 

Note  that  0i^o  is  generated  by  0|  and  <f>Q,  which  given  0|^o  ^f®  clearly 
unique  by  definition.  Clearly  the  0|^o  ^  map. 

Definition  5.5.4: 

Let  0i,Q  be  a  0i;,o'map  from  Vj*  x  Vq*  to  0i(Vj*)  x  0o(Vo*)'  Define  a 
p-map  from  P[Vf^  x  V^*)  to  P(0/(V/*)  x  0o(^o*)l  ^  ^®  “X  map 
such  that: 

(• ;  P(V,'  X  Vo'l  -  PWV,')  X  iio(Vo')l. 

VE  =  {<v.,v|,»  e  P(V,' X  Vo'l, 

ME)  =(<({<''.,''b>»  -  (M.o  «''.,''b»>- 

Note  that  p  is  generated  by  Clearly  the  ^  is  an  onto  map. 

Definition  5.5.5: 

Let  T*  =  T[V,*  X  Vo‘]  and  T*  =  TI0,(V,‘)  x  0o(Vo‘)l  be  two  T-sets. 
Let  p  he  &  /i-map  from  P[Vi*  x  Vq‘]  to  Pl0i(V|‘)  x  0o(Vo*)l-  Define  a 
0-map  from  to  to  be  any  map  such  that: 

0  :  T'  T*  V  <{E„  I  m=l,2,...n},  Ef>  G  T‘, 

V'(<{E„  I  m=l,2,...n},  Ef»  ^  <{/i(EJ  I  m=l,2,...n},  p(Ef)>. 


60 


Note  that  V*  is  generated  by  /i.  Clearly  the  is  an  onto  map. 

Unless  otherwise  noted  for  the  rest  of  this  chapter  this  notation  will  be  used: 

P>  =  P(V,>  X  Vo'l,  P*  =  PWV,‘)  X  (»o(Vo')l- 
C‘  =  C[V,'  X  Vo'l.  C*  =  CWVi')  X  ^o(Vo')l- 
K'  =  K[V,'  X  Vo'l,  K*  =  K|0tfV,')  X  *o(Vo'))- 
S'  =  S(V,'  X  Vo'l,  S»  =  SWV,')  X  (»o(Vo')l- 
T*  =  TIV,'  X  Vo'l,  T*  =  TWV,')  X  «o(Vo')|. 

The  Lemmas  and  Theorems  5.5.6  to  5.5.10  discuss  the  heritage  of  some 
properties  between  the  auxiliary  maps  and  correspondences.  Alone  these 
results  are  not  of  practical  importance,  however,  the  results  will  be  used  in 
Section  5.7  to  discuss  the  properties  of  quasimorphism,  which  is  the  main  goal 
of  these  two  sections. 

Ltmma  5.5.6: 

Let  and  generating  0i„o- 

and  are  1:1  maps  iff  is  1:1  map. 

Proof: 

Let  X  Vq*  -►  ^i(Vi*)  X  ^o(Vo‘)  i>e  the  ^,^o-™®P- 

Cate  1:  Show  :  \/<u.,Ub>,  <v„Vb>  G  Vj*  x  Vq*, 

^IxO«U..«b>)  =  ^IxO(<  Vb>)  <u*,Ub>  =  <V.,Vb>. 

(1) :  ^IxO«U..Ub»  =  <^«.).  0o(«b)>.  ^IxO«Va,Vb»  = 

<^iK),^oK)>.  and  ^xO««..«b)>  =  ^ixO«v.,Vb» 

-♦  <^l(u,),^o(nb)>  =  <^(vJ,^o(Vb)>  ^i(uj  =  ^(Vb), 
^o(nb)  =  ^oK)- 

(2) :  (1)  and  0i,0o  LI  u^  =  v„  Ub  =  Vb 

<U^,Ub>  =  <V.,Vb>. 


Coat  2:  Show:  V  u.,v.  €  Vj,  0,(u  J  =  ^(v,)  u.  =  v,; 
and  VUb.Vb  €  Vq,  ^©(“b)  =  ^o(vb)  Ub  =  '^b- 

(1) :  ^i(«.)  =  0o(«b)  =  ^o(vb)  <^«a).^o(“b)>  = 

<^lK),^0(Vb)>  ^IxO««a.Ub»  =  ^IxO«VvVb»- 

(2) :  ^ixO  1:1  and  (1)  <«a,Ub>  =  <v.,Vb> 

«a  =  V»,  Ub  =  Vb. 

□ 

Lemma  5.5.7: 

Let  ^1^0  1)^  ^ixO'^ap  generating  /i. 

^1^0  ^  1:1  ™ap  iff  /i  is  1:1  map. 

Proof: 

Let  /i  :  -*  be  the  p-map. 

Caae  1:  Show:  V  E„,E„  €  PS  =  p(E„)  E„  =  E„. 

(1) :  Let  E„  =  {<u„Ub>}  and  E„  =  {<v„Vb>}. 

MEm)  =  M{<Ua.Ub>})  =  {^xO««a.«b»}. 

MEJ  =  M{<Va,Vb>})  =  {<^i*o«v.,Vb»}  and  p(EJ  =  p(EJ 

{^xO««a.«b»}  =  {?^xO(<Va,Vb»}- 

(2) :  Show  E„  C  E„. 

(2a):  <u^ui>€E„  ^^i^o«ul.Ub»  Gp(Em) 

^xO(<'*ai'*b>)  €  {^IxO(<Ua»'*b>)}- 

(2b):  /i(E„)  =  p(E„)  and  (2a)  ^Ix0«ul.ui»  €  /i(E„) 

-*  3  <vl,vi>  G  E„  3:  <;^,o( <«!.%»  =  0i,o(<vl,vi ». 


<U»,Ub>  =  <V„Vb> 


(2c):  (2b)  and  ~ 

€E„  E„CE„. 
(3):  Similarly  E„  C  E,„. 


(4):  (2c),  (3)  -►  E„  =  E„. 

Ca$t  2:  Show:  V  <u„Ub>,  <v.,Vb>  6  Vj*  x  Vq*,  ^ixo(<«..Ub>)  = 

^IxO«Va,Vb»  <U.,Ub>  =  <V^,Vb>. 

(1) :  Let  E„,E„  €  P‘,  E^  =  {<u^,Ub>  |  singleton},  E„  =  {<v^,Vb>  | 

singleton}.  ^,o«“v«b»  =  ^xo(<v.,Vb>)  -♦ 

{^lxo(<u».Ub>)  I  singleton)  =  {^i,o(<Vb»|  singleton}  -♦ 
/i(E„)  =  /i(E„). 

(2) :  /i  1:1  and  (1)  En,  =  E„  {<«„Ub>  |  singleton}  = 

{<'^».^b>  I  singleton}  ^  <u»,Ub>  =  <v„Vb>. 


Lemma  5,5.8: 


Proof: 


Let  n  be  the  /i-map  generating 
ft  is  1:1  map  iff  0  is  1:1  map. 

Let  0  :  T‘  — *  T*  be  the  0-map. 

Case  1:  Show:  VT‘-‘,  T‘'j  €  T',  0(T‘-')  =  0(T‘-j)  - 

(1):  Let  T'-*  =  <{E^^  \  m=l,2,...p},  E^.‘> 

and  T‘j  =  |  o=l,2,...q},  E^j>. 

0(T‘'*)  =  0«{E,|,-'  I  m=l,2,...p},  E^.*»  = 
<{MEm’‘)  I  m  =  l,2,...p},  ^(E^’*)>, 

0(T'J)  =  0«{E„'-i  I  n  =  l,2,...q},  E^-i»  = 


T*>'  =  T*’^ 


<{MEn’^)  I  n=l,2,...q},  |*(E^J)>  and 

=  <{ME„‘j)  I  n=l,2,...q},  MEf’0>. 

(2) :  (1)  //(Ef*-')  =  /i(E^-i). 

(2a):  (2)  and  /i  1:1  E/*'  =  E^^. 

(3) :  Show  {E^*  |  m=l,2,...p}  C  {E^>j  |  n=l,2,...q}. 

(3a):  E^'*  €  {E^-*  |  m=l,2,...p}  -►  /i(E^‘)  €{/i(Em’‘)  |  m=l,2,...p}. 

(3b):  (3a),  (1)  /i(Ei>‘)  G  {/i(E*j)  |  n=l,2....q} 

-4  3  Ej'i  €  {E^  i  I  n=l,2,...q}  3 :  p(Ei')  =  /i(Ej'i). 

(3c):  (3b)  and  /i  1:1  E^-' =  E«»-i  G  {E„‘-i  |  n=l,2,...q} 

-*  {E,i,’'  I  m=l,2,...p}  C  {En‘'j  I  n=l,2,...q}. 

(4) :  Similarly  {E„‘’j  |  n  =  l,2,...q}  C  (E^-®  |  m=l,2,...p}. 

(5) :  (3c),  (4)  (Ei*  |  m=l,2,...p}  =  {E^-i  |  n=l,2,...q}. 

(6) :  (1),  (5)  <{EJ^'  I  m=l,2,...p},  E^-*>  = 

<{E„‘i  I  n  =  l,2,...q},  Ep>  ^  T»'  =  T>-i. 

Case  2:  Show:  VEj,Eb  G  P‘,  ME*)  =  E^  =  Eb- 

(1) :  Consider  {E,},’*  |  m=l,2,...p}  and  {E„‘’'  |  n  =  l,2,...q} 

3  :  (Ei'  I  m=l,2,...p}  =  {E„‘-i  |  n  =  l,2,...q}  and  E^-‘,  E„‘*j  G  P‘- 

Consider  E,,Eb  G  P‘  3 :  #<(E,)  =  Then 

<{Ei-‘  I  m=l,2,...p},  E,>  G  T[V,‘  x  Vq*]  and 

<{E„*-j  I  n  =  l,2,...q},  Eb>  G  T[V,‘  x  Vq*],  denoted  T*-'  and  T*-' 

respectively. 

(2) :  <{/i(Ej,')},  p(E.)>  =  <(,i(E.‘’i)},  ,i(Eb)>  =  V'(T''i)  .nd 

(1)  -*  {/i(Ei'')  I  m=l,2,...p)  =  {/i(E„'J)  I  n  =  l,2,...q>. 


(3) ;  (2)  ud  MEJ  =  /.(Et)  -►  V<T'  i)  =  V<T'J). 

(4) ;  (3)  and  V>  1:1  -•  T’’*  =  T'J  -►  E.  =  E^. 


Theorem  5.5.9: 


Let  and  be  the  ^pmap  and  ^Q-map  generating  rj) :  T* 
and  00  are  1:1  maps  iff  0  is  1:1  map. 


Proof: 


Follow  directly  from  Lemmas  5.5.6,  5.5.7,  and  5.5.8. 


Theorem  5.5.10: 


Proof: 


Let  0  :  T‘  — ►  T*  be  a  0-nniap  generated  by  /<. 

H  is  1:1  map  iff  T*  and  are  isomorphic  T-sets. 

Let  /I  :  P‘  -  P*. 

Case  1:  Show  0  is  1:1  and  onto  and  0  is  morphism. 

(1) :  Show  0  is  1:1. 

(1.1) :  Lemma  5.5.8  and  ^1:1  0  is  1:1. 

(2) :  Show  0  is  onto. 

(2.1) :  0  clearly  onto. 

(3) :  Show  0  is  morphism. 

(3.1) :  By  definition  of  0  it  is  a  morphism. 

Case  2:  Show  u  is  1:1  and  onto. 


74 


(1) ;  Show  ;i  is  1:1. 

(1.1) :  Lemma  5.5.8  and  ^  1:1  ^  p  is  1:1. 

(2) :  Show  ft  is  onto. 

(2.1) :  It  clearly  onto. 

□ 

The  following  two  examples  and  Theorem  5.5.13  illustrate  some  properties  of 
the  T-sets. 

Example  5.5.11: 

Consider  the  structure  with  the  input  and  output  label  sets  V|  =  {uq}, 
Vo  =  {vo,v,}. 

Describe  the  T-set  over  Vj  x  Vq. 

Solution: 

(1) :  T[V,  X  Vo)  =  «E.  Er>}  = 

{<{E„|  m  =  1,2, ...a),  Er>  | E„6P[V,  x  Vo),  Ef€P)V,  x  Vo)}. 

(2) ;  |V,xVo|  =  |V,|  X  |Vo|  =2, 

)3):  |P)V,x  Vo)|  =2»  =  4. 

(4) :  The  number  of  different  E  is  equal  to  the  number  of  subsets  of 

PfVixVo]  =2^ 

(5) :  The  number  of  different  Ep  is  |  PfVi  x  Vo)| . 

(6) :  |T[V,  xVoll  =2^x4  =  2®  =  64. 


Example  5.5.12: 

Consider  the  structure  with  the  input  and  output  label  sets 
Vi  =  {uo,Ui},  Vo  =  {vo,V|,v2}. 

Describe  the  T-set  over  Vf  x  Vq. 

Solution: 

(1) :  T(V,  X  Vol  =  {<E,  Er»  =  «{E„|  m  =  1,2, ...n},  Er> 

I  E„ePtV,  X  Vol,  EfCPIVi  X  Vol). 

(2) :  |V,xVo|  =  |V,|  X  |Vo|  =6. 

(3) :  I  P[V,  X  Voll  =  2«  =  64. 

(4) :  The  number  of  different  E  is  equal  to  the  number  of  subsets  of 

P[VixVol 

(5) :  The  number  of  different  Ep  is  |  P[Vi  x  Vol| . 

(6) :  lT(V,x  Voll  =2«<x  64  =  2^®. 

Although  T{V[  X  Vol  includes  all  possible  systems  over  Vj  x  Vo,  there  are 
many  structures  included  that  are  not  systems.  An  instance  of  such  structure 
is  any  structure  that  contains  the  correspondence  {<Uo,Vq>,<ui,Vq>}  which 
is  a  destructive  correspondence.  Because  the  T*set  has  less  restrictions  on  valid 
structure  members,  it  is  easier  to  work  with.  Also  for  the  same  reasons  it  will 
contain  a  superset  of  the  S-set  S[V|  x  Vo).  From  above  the  following  theorem 
is  derived. 


Theorem  5.5.1S: 


Let  S[V|  X  Vq]  and  T[V|  x  Vq)  be  the  S-set  over  Vj  x  Vq  and.T-set 

over  Vj  X  Vq  respectively.  Then  the  cardinality  of  the  T-set  is 

(2  exp(2  exp(|  Vi|  x  |  Vq]  )))  x  (2  exp(|  Vi|  x  |  Vq]  )),  which  is  also  an 
upper  bound  on  the  cardinality  of  the  S-set. 

Proof: 

(1) :  I  TfVj  X  Vq]]  =  (  number  of  different  E  )  x  (  number  of 

different  Ep  )  =  =  (2  expj  P[Vi  x  Vq]}  )  x  (2  exp|  V|  x  Vq]  ) 

=  (2  exp(2  exp|  V,  x  Vq]  ))  x  (2  exp|  V,  x  Vq]  ) 

=  (2  exp(2  exp|  V,|  x  |  Vq]  ))  x  (2  exp(|  V,|  x  |  VqI  ))  . 

(2) :  S(V,  X  Vq]  C  T[V,  x  Vq)  =»  |  S[V,  x  VqH  <|T[V,  x  Vol| 

I  T[V|  X  Vo]|  is  an  upper  bound  of  the  cardinality  of 

S[Vi  X  Vq). 

□ 

The  properties  of  T-sets  will  be  used  in  the  next  section  to  discuss  the  S- 
sets  which  is  our  primary  goal.  The  mappings  <j>Q,  ^i^q,  /i,  and  iff  will  have 
their  counter  part  in  the  domain  of  S-sets  and  some  properties  derived  in  the 
domain  of  T-sets  will  carry  into  the  domain  of  S-sets. 

The  results  of  the  preceding  discussion  can  be  summarized  as  follows. 
Given  T*  =  T[V,*  x  Vq*]  and  T*  =  T(0,(V,*)  x  ^o(Vo‘)l  two  T-sets,  where 
is  a  ^map  and  ^q  is  a  ^Q-map.  Then  the  and  ^q  maps  uniquely  determine 
a  ^,0-map  ^,,o,  :  V,‘  x  Vq*  —  x  ^o(Vo')-  The  ^i.Q-map  ^,,o, 

then  uniquely  determines  a  /i-map  ft,  ft :  P[Vi‘  x  Vq*]  -♦  P[^(Vi‘)  x  ^0(^0*)!- 
The  /i-map  ft,  then  uniquely  determines  a  V'^niap  tf). 


* 


77 


0  :  T[V,»  X  Vo*I  -  TI0,(V,M  x  0o(Vo‘)l. 

Conversely,  given  a  0-inap  0,  it  uniquely  determines  a  /i>map  jt.  The  ft- 
map  ft,  then  uniquely  determines  a  0i^o~™^P  0ixO-  0lxO*™^P  0lxO> 
uniquely  determines  a  <f>i  and  0o  maps.  To  summarize,  the  0-map  0  uniquely 
determines  0|  and  0o  maps  and  0i  and  0o  maps  uniquely  determine  a  0-map 
0. 

Another  important  result  of  this  section  is  that  certain  properties  of  0}  and 
00  maps  are  inherited  by  the  0-map  0  and  vice  versa.  Specifically  proven  here 
was  the  important  fact  that  the  diagram  (0i,  0o)  *-*■  (0ixo)  *-*  if*)  *-*’  (0) 
commutes  when  each  map  is  1:1  (Figure  5.1.)  That  means  that  not  only  0i,  0o 
1:1  maps  induce  a  0  1:1,  but  also  if  0  is  1:1  then  it  induces  0|,  0o  1:1  maps. 


6.8  Quasimorphlsm 


In  this  section,  based  upon  the  concept  of  morphbm  of  groups,  a  new 
similarity  measure  between  systems  b  defined  that  allows  a  comparison 
between  arbitrary  (regular  and  irregular)  systems.  This  measure  b  called 
quasimorphbm  and  b  completely  specified  by  two  mappings  called  0{  and  0o- 
The  quasimorphbm  will  facilitate  the  analysis  of  following  problems  in  parallel 
processing: 

(a)  system  A  emulating  system  B  (three  different  degrees  of  strictness  of 
emulation  are  discussed); 


AV 


’8 


i 

i 

(b)  fault  tolerance/reliability  (achieved  by  multiple  mapping  of  same 
problem  into  a  system); 

(c)  partitioning  of  a  system. 

In  this  section  the  relationships  among  systems  will  be  explored.  Since 
S[Vi  X  Vq)  C  T[V|  X  Vq]  or  the  S-set  over  V|  x  Vq  is  a  subset  of  the  T-set 
over  V|  X  Vq,  it  will  be  shown  that  most  relationships  among  systems  treated 
as  elements  of  T-set  carry  from  the  T-set  domain  to  the  S-set  domain,  while 
other  relationships  carry  over  in  a  somewhat  weakened  form.  All  the  maps 
^oi  ^  T-sets  will  have  their  counterpart  in  the  context  of 

S-sets.  Since  the  S-set  S[Vj  x  Vq]  and  the  T-set  T[V|  x  V©]  are  both  defined 
over  the  same  underlying  set  V{  x  Vq  the  following  maps  defined  based  on 
V|t,V^  and  V^,V^  are  directly  applicable  for  analysis  of  relationships  between 
systems. 

Definition  5.6.1: 

These  maps  have  identical  meaning  in  the  context  of  S-sets  as  in  the 
context  of  T-sets.  The  maps  are: 

:  V/  ^i(V/),  onto. 

'  ^o(V6).  onto. 

^ixO  :  Vi*  X  Vq*  0i(Vi‘)  x  ^o(Vo‘).  onto. 

H  :  P(V,*  X  Vo‘I  -  P(0,(Vi»)  X  «>o(Vo‘)l,  onto. 

0  :  T[V,*  X  Vo‘l  -  T(0i(V,‘)  x  0o(Vo')I.  onto. 

The  Definitions  S.6.2  and  5.6.3  are  intermediate  steps  used  to  define  the 
quasimorphism  formally. 


Definition  5.6.2: 

Let  /I  :  P*  P*  be  a  /i-map,  generated  by  ^  and  ^q.  Let  C*  and  C*  be 
two  C-sets.  Define  a  ]i -eorreepondenee  from  C*  to  C*  to  be  any 
correspondence  such  that: 

/i  :  C*  C*;  ]i  ^  p|c‘*c*- 

Note  that  if  <Ci,C2>  €  ]i  then  /i(Ci)  =  /i(C„*).  Clearly  if  /i  is 
generated  by  a  “  generated  by  the  same  d>uQ. 

Definition  5.6.3: 

Let  V'  :  T‘  -*  T*  be  a  V^map,  generated  by  and  <^q,  T‘  = 
T[V,»  X  Vo^l,  T2  =  T(<^(V,')  X  ^o(Vo‘)).  Let  S»  =  S[V,*  x  Vo‘l  and 
=  S(^i(V|*)  X  ^o(Vo*)l  ***  l^wo  S-sets.  Define  a  0  ’correspondence  from 
5*  to  to  be  the  correspondence  such  that: 

0  :  S*  -*  S*;  ^  ^  0|  si,s*- 

Note  that  if  <S‘-SS*'j>  G  ^  then  ^&(S‘-')  =  V'(S’’').  Clearly  if  V’  is 
generated  by  a  /i  then  V*  is  generated  by  the  same  p. 

Definition  5.6.4: 

Let  ^  :  S*  -♦  S*  be  a  ^-correspondence.  Let  <S‘’SS^’i>  G  V'-  Then  ^  is 
called  a  quasimorphism  from  S**’  to  5*'^ . 

The  Lemmas  5.8.5  to  5.6.7  describe  the  heritage  of  properties  between  the 
auxiliary  correspondences.  These  results  will  be  used  in  the  Theorems  5.6.8 
and  5.6.9  to  describe  the  heritage  of  properties  between  the  elementary  maps  0i 
and  and  the  ^-correspondence  which  is  the  basis  of  quasimorphism. 


Lemma  5.6.5: 


Let  /I  :  C*  -♦  C*  be  a  /i-correspondence  generated  by  0|  and  ^q,  C*  = 
C[V,*  X  Vo'l,  C2  =  C(^(V,*)  X  0o(Vo‘)l-  Then  pi  is  an  onto 
correspondence. 

Proof: 

Show  V  C2.i  €  C*  3  e  C\  /?(€„»••)  = 

(1) :  Let  =  {<Vj,Vb>,...}  and  C„‘''  = 

(2) :  C2.i€C2  C2.i€P2 

(3) :  (2),  Definition  5.5.4  -+  3  E„‘-'  6  P*,  =  C^K 

(4) :  Construct  C„‘-*  C  E„»^  C^-'  6  C‘,  =  C^j. 

(4.1) :  IfE„‘.‘  nondestructive  then  go  to  (5)  else:  3  <u,,Ub>,  <Ue,U|>> 

€  E^*'  ->  ^ixO««c.Wb»  €  C2.i  -♦ 

<0l(uJ.0o(“b)>.  <^i(Uc),^o(«b)>  € 

% ' 

(4.2) :  nondestructive  — ♦  ^l(u»)  =  ^i(vie)* 

Let  €„*’*  =  E„‘’*  -  {<u„Ub>  I  singleton}. 

(4.3) :  Let  E^’‘  =  C„‘’*  and  go  to  (4.1). 

(5) :  C„‘-'  =  E C„‘-'  €  C  and  }i(C  »■*)  =  C^-i. 

□ 


Lemma  5.6.6: 

Let  01^0  the  generating  ]i. 

01  Q  is  1:1  map  iff  Ji  is  1:1  correspondence 


Let  :  C*  -»  be  the  /Z-correspondence. 

Cant  1:  Show:  /I  is  1:1  correspondence. 

(1) :  1:1  map  and  Lemma  5.5.7  /i  is  1:1  map. 

(2) :  (1)  and  Ji  restriction  correspondence  of  p  is  1:1 

correspondence. 

Case  2:  Show:  V  <u„Ub>,  <v»,Vb>  G  V,‘  x  Vq*,  ^ixo(<«».«b>)  = 

<U».Ub>  =  <V„Vb>. 

(1) :  Let  C„,  C„  €  C‘,  C„  =  {<u„Ub>  |  singleton},  C„  =  {<v.,Vb> 

I  singleton}.  ^xo(<u».“b>)  =  ^lxo(<'^a.Vb>)  ^ 

{^lxO««».«b»  I  singleton}  =  {^ixO«v„Vb»|  singleton} 
/?(C„)  =  /I(C„). 

(2) :  /Z  1:1  and  (1)  =  C„  {<u»,Ub>  |  singleton}  = 

{<v.,Vb>  I  singleton}  -♦  <u.,Ub>  =  <v^,Vb>. 


Lemma  5.6.7: 


Let  ^  be  a  /i-map.  Let  /i  be  the  restriction  ^i-correspondence 

of  the  /i. 


|i  is  1:1  map  iff  ft  is  1:1  correspondence. 


Proof: 


Let  ^  :  C*  — »  be  the  p-correspondence. 

Case  1:  Show:  p  is  1:1  correspondence. 

(1):  p  1:1  map  and  p  restriction  of  p  ^  p  is  1:1  correspondence. 


Case  2:  Show  /i  is  1:1  map. 

(1) :  fi  1:1  and  Lemma  5.6.6  ^1^0  ^ 

(2) :  (1)  and  Lemma  5.5.7  /i  is  1:1  map. 


The  next  two  theorems  relate  the  properties  of  4>i  and  <f>Q  and  the  V’- 
correspondence.  The  significance  of  these  properties  b  described  at  the  end  of 
this  section  in  details. 

Theorem  5.6.8: 

Let  V'  :  S*  — ►  be  the  ^-correspondence,  generated  by  and  4>q.  If 

and  4>o  are  1:1  maps  then  0  is  1:1  correspondence. 

Proof: 

(1) :  Theorem  5.5.9  — ►  0  is  1:1  map. 

(2) :  0  restriction  correspondence  of  0  0  is  1:1  correspondence. 


Theorem  5.6.9: 


Proof: 


Let  0  :  — *  S*  be  the  0-correspondence  generated  by  0|  and  0o- 

If  0  is  1:1  correspondence  then  0|  and  00  are  1:1  maps. 

The  procedure  will  be  done  using  contradiction.  Let  0  be  1:1 
correspondence  and  assume  that  0i  or  0o  both)  is  (are)  not  1:1.  For 
each  case  construct  S’*',  S*’^  G  S*  3c  0(S*’')  =  0(S*’O  G  S*  and  S*''  /  S'-j 
therefore  implying  0  is  not  1:1  correspondence  which  is  a  contradiction. 
Case  1:  0i  not  1:1  map.  Let  Vi*  =  {u,,U2,...Um},  =  {v,,V2,...v„}, 


=  {W|,W2,...W,},  ^o(Vi)  =  {X,,X2,...X,}. 

(1) :  ^  not  1:1  ^  3  u»,Ub  €  V,*,  ^i(uj  =  ^i(ub). 

(2) :  Construct  S**'  €  S*  as  follows: 

(2.1) :  C»-‘  =  {Cp»'5 1  p=l,2,...m}  U  {OJ  =  {{<Up,v,>  |  v,  G  V^} 

I  Up  €  Vi‘,  p=l,2,...m}  U  {0c}- 

(2.2) :  C/’*  =  {<u,,Vi>  I  singleton}. 

(2.3) :  S‘'‘  =  <C*'‘,C^‘>. 

(3) :  Construct  S*’^  €  S*  as  follows: 

(3.1) :  C‘-j  =  C*-'. 

(3.2) :  C^’j  =  {<Ub,v,>  I  singleton}. 

(3.3) :  S‘-j  =  <C''j,C^j>. 

(4) :  (1),  (2),  and  (3)  MCp’‘)  |  p=l,2,...in}  =  {^(0^), 

MCp‘’j)|  p  =  l,2,...,m}. 

(5) :  Ai(C,)'')  =  #i({<u^,v,>  I  singleton})  = 

{<^l(uJ,<^o(vi)>  I  singleton}  =  {<^(ub),^o(vi)>  | 
singleton}  =  /<({<Ub.vi>  |  singleton})  =  //(C^’j). 

(6) :  (4),  (5)  -♦  V'(S‘’')  =  V'(S‘'0  G  T^. 

(7) :  (2.2),  (3.2)  ->  C^j  C^-j  S*-'  ^  S'J. 

(8) :  Show  V'(S'-‘)  =  V-«C‘-',C/>'»  G  S*. 

(8.1) :  Show  <{/i(0c),  MCp’  ’)  I  P  =  1.2,  -m}  >6  K^. 

(8.1.1) :  Show  {/i(0c),  MCp'-')  I  p=l,2,...in}  C  Cl 

(8.1.1.1) :  /i(0,)  =  0,  €  C*. 


(8.1.1.2) :  MCp*-')  =  M{<VVc>  I  v,GV<^})  =  {<^,(Up),^o(Vc)>  | 

Vc  €  V^}  G  C\ 

(8.1.1.3) :  (8.1.1.1),  (8.1.1.2)  {/i(0,),  /i(Cp‘'')  |  p  =  l,2,...m}  C  C^. 

(8.1.2) :  Show  ^i(V,‘)  =  s({#<(0J,  /i(Cp'  *)  |  p  =  l,2,...m}). 

(8.1.2.1) ;  s({/i(0J,  /i(Cp‘'')  I  p=l,2,...m})  =  s({/i(Cp‘'')  |  p  =  l,2,...m})  = 

s( {/*({< Up, Vc>  I  V,  G  V^})  I  Up  G  Vj‘})  = 
s({{<^l(Up),<io(Vc)>  I  V,  G  V^})  I  Up  G  V,*})  =  {^i(Up)|  Up  G 
V,‘}  =  <^,(V,‘). 

(8.1.3) :  Show  0o(V(^)  =  d({#i(0J,  A/(Cp‘-‘)  |  p-l,2,...m}). 

(8.1.3.1) :  d({/x(0<.),  MCp’*)  I  p  =  l,2,...m})  =  d({/i(Cp'-')  |  p=l,2,...m})  = 

d({/i({<Up,v,>  I  V,  G  V^})  I  Up  G  Vj‘})  =  d({{<<^i(Up), 
0o(vc)>  I  V,  G  V^})  I  Up  G  Vi‘})  =  {<io(Ve)  I  V,  G  V^}  = 
^o(V<i). 

(8.1.4) :  Show  |{/i(0,),/i(Cp*-‘)  3c  p=l,2,...m}|  >  2. 

(8.1.4.1) :  M0c)  =  MCp*-')  ^  0c  |{/i(0c),  Ai(Cp'’-)  3: 

p  =  l,2,...m}|  >  2. 

(8.1.5) :  (8.1.1),  (8.1.2),  (8.1.3),  and  (8.1.4)  <{M0c),  MCp’')  | 

p=l,2,...m}>  G  K^. 

(8.2):  Show  /i(Cp’*)  is  a  feedback  correspondence  over 

^i(V,‘)  X  ^o(V^). 

(8.2.1) :  n(C^'')  =  /i({<u»,v,>  I  singleton})  =  {<^i(uj,0o(vi)>  | 

singleton}  ^  ^(Cf’')  *  feedback  correspondence  over 

^i(Vi‘)  X  ^o(Vi). 


2/3 


no-flic/  33C 

UNCLASSIFIED 


DISTRIBUTED  CONPUTINB  FOR  SIQIML  PROCESSING: 

TOPOLOOICm.  PROPERTIES  OF  IN. .  (U>  PURDUE  UNIV  LRFRVETTE 
IN  R  R  SEBNN  DEC  83  BR0-1879B.  17-EL-RPP-E 
Dnfl029-82-K-BlBl  F/G  9/2 


(8.3):  (8.1),  (8.2)  V^S*-')  =  €  S* 

(0):  (8)  ^  <S‘-‘.S2>>  €  ^  and  <S»j,S**'‘>  6  0. 

(10) :  (6),  (0)  -*  0(S‘>*)  =  0(S*-i). 

(11) :  (7),  (10)  0  not  1:1  contradiction  ^  0|  is  1:1  map. 

Case  2:  <I>q  not  1:1  map.  Let  V,‘  =  {n,,U2,...u„},  =  {v,,V2,...v„}, 

=  {wi,W2,...w,},  0o(Vo)  =  {x,,X2,...x,}. 

(1) :  0o  not  1:1  3  €  V^,  0o(vj  =  0o(vb)- 

(2) :  Steps  (2)  to  (11)  are  same  as  Case  1  except  (2.2)  and  (3.2). 

(2.2) :  Cf'  =  {<U|,v^>  I  singleton). 

(3.2) :  =  {<Ui,Vb>  I  singleton). 

□ 

The  theoretical  work  presented  in  this  section  has  the  following  physical 
implications.  Given  two  systems  S*  and  with  arbitrary  vertex  descriptions, 
if  there  exist  0  that  is,  a  0|  and  0q  with  the  proper  constraints  from  S*  to  S^, 
then  and  are  similar  in  some  sense.  If  the  0  quasimorphism  b  1:1,  then 
in  fact  the  systems  are  bomorphic,  that  b  identical  up  to  the  naming  of  the 
vertices.  The  0  =  <0i,  0o>  be  used  in  the  following  problems:  (1) 
emulation  of  systems;  (2)  identifying  equivalent  systems;  and  (3)  partitioning  of 
a  network. 

The  following  two  theorems  discuss  some  basic  properties  of 
quasimorphbm  0.  In  the  study  of  mathematical  relations,  three  properties  are 
of  utmost  importance.  The  properties  are:  reflexive,  symmetric,  and  transitive 
which,  if  they  hold,  say  that  the  relation  is  an  equivalence  relation.  Although 
0*correspondence  b  not  a  relation,  properties  similar  to  the  three  above  can  be 


defined.  They  do  have  physical  significance  concerning  quasimorphbm  between 
systems  as  described  in  the  end  of  this  section. 

Theorem  5.6.10: 


Let  S^,  and  be  three  systems.  The  quasimorphism  has  the 
following  properties. 

(1)  3  0  such  that  0  (S‘)  =  S*. 

(2)  0*  (S*)  =  3  such  that  (S*)  =  S*. 

(3)  0*  (S»)  =  S*  and  0*  (S*)  =  S*  -♦  3  0,  0  (S*)  =  S®. 

Proof: 

(1) :  Need  to  show  3  0  such  that  0  (S*)  =  S*.  Let  0  =  <0i,0o> 

be  such  that  0i,0o  identity  maps.  The  rest  is  straightforward. 

(2) :  Must  show  0*  (S‘)  =  S®  3  ^  such  that  0*  (S®)  =  S*. 

Construct  an  example  of  S*  and  S®  such  that  <SSS®>  £  0  and 
there  does  not  exist  ^  3t  0*  (S®)  =  S*. 


Let  Let 

S*  =  <C‘,C/>,  S®  =  <C®  C^>, 

V,*  =  {v.,Vb},  Vi  =  {u„ud},  V,®  =  {w.}.  Vi  =  {x,.x,}, 

=  {<v„Uc>,  <Vj,,U4>},  =  {<w„Xe>}, 

C*  =  {Co‘,  C/},  .  C*  =  {C*,C?}, 

Co  =  {<Vb.“d>}»  ^0  =  {<Wb,Xc>}, 

C*  =  {<v„u^>,<v^,uj>}.  C?  =  {<w.,x^>,<w^,xj>}. 

Then  0  =  <0i,0o>  with  0i(vj  =  w.,  0,(vb)  =  w,,  0o(Uc)  =  *d. 
0o(Ud)  =  x,  is  a  quasimorphism  0,  0  (S^)  =  S*.  but  there  does 


Dot  exist  a  quasimorphism  from  to 

(3):  Must  show:  (S*)  =  S*,  (S*)  =  S*  -♦  3  0  (S‘)  = 

S^.  This  will  be  shown  by  exhibiting  quasimorphism  ^  (S') 

=  S» 

(1) :  Let  S*  =  <CVC^>;  S*  =  <C*,C^>;  and  S*  =  <C»,C^> 

be  three  systems. 

(2) :  ^'(S')  =  S* 

-►  3  ^  :  V/  -  ^(V,'),  ud  3  dA  :  Vi  -  dA(VA). 

(3) :  (2)  -•  3  dAo  ;  V|'  X  Vi  -  d,'(V,')  X  dA(VA)  ud 

3  p'  ;  P[V,'  X  Vi)  -  Pld,'(V,')  X  dA(VA)|. 

(4) :  5?(S*)  =  S* 

-•  3  d?  :  d,'(V,')  d?(di‘(V,')),  *nd 

3  di  :  4(VA)  -  dA(dA(VA)), 

(5) :  (4)  -•  3  dgo  :  ^iW)  x  dA(VA) 

-  X  dA(dA(vA)).  ud 
3  d*  :  P|d,'(V,')  X  dA(VA)| 

-  pidi'(*'(v,'))  X  da(4(vA))). 

(•):  D«8ll«:  d,  =  df  o  d|'  :  -  d?(d|'(V,‘)).  (“o’  H 

eomposilion  of  mapa) 

Clearly:  is  map. 

(7) :  Define:  0o  =  <>  ^S(0i(V^)). 

Clearly:  0o  is  nnap. 

(8) :  De6ne  duo  =  ^0  o  dio  :  dlW)  x  dA(VA) 

-  ^(dlW))  X  dA(«>A(VA)). 


88 


Clearly:  ^ 

(9) :  Defiae:  p  =  /i*  o  /i* :  PI^*(V|*)  x  ^i(V^)l 

-  P|^*iW))  X  *M(vi))l. 

Clearly:  n  is  map. 

(10) :  Claim:  0  =  <4h^o>  »  quasimorphUm,  (S*)  =  S*. 

(11) :  Show:  ii(C^)  =  C^. 

(S»)  =  S*  it\Cj)  =  Cl 

^  (S*)  =  s*  -♦  ft\Cl  =  cl 

-*  AAC^))  =  -♦  (/*"  o  p»)(C^)  =  ;i(C^)  =  cl 

(12) :  Show:  V  C^  €  C‘  3  Cj  G  C*  3 :  #i(Ci)  =  Cj. 

(a) :  (S‘)  =  S2 

-►VCiGC*  3  C*  €  C*  3 : /iMCi)  =  C* 

(b) :  ^(S*)  =  S» 

-♦VC^GC*  3CJgC»3:/|2(C*)=CJ. 

(c) :  (a),  (b)  V  Ci  G  C  3  Cj  G  C» 

3:p*(p'(Ci))  =p*(C2). 

(13) :  (11)  and  (12)  — *  ^  “  quasimorjAism  ^  (S*) 

=  S» 

□ 

Tkeoretn  5.6.11: 


Let  S*,  S*,  and  be  three  systems.  The  quasimorphism  1:1  has  the 
following  properties. 

(1)  3  1:1  such  that  ^  (S‘)  =  S‘. 


Proof: 


89 

(2)  (S‘)  =  S*  1:1  3  1:1  such  that  (S*)  =  S*. 

(3)  (S')  =  S*  1:1  and  (S*)  =  S*,  1:1  3  1:1,  ^ 

(S')  =  S*. 


(1) :  Need  to  show  3  ^  such  that  0  (S')  =  S'.  Let  0  =  <0i,0o> 

be  such  that  0i,0o  identity  maps.  The  rest  b  straightforward. 

(2) :  Must  show  0*  (S')  =  S*  ^  3  ^  such  that  ^  (S*)  =  S*. 

(1) ;  <S‘,S»>  €  «,  1:1  -►  3  ^  :  V/  *rfV,'),  1:1  md 

3  do  :  VA  ^  do(VA),  1:1- 

(2) :  (1)  -•  3  d,»  :  di(V,')  V,',  1:1,  d,*  =  df',  »Ild 

3  dA  :  do(VA)  ^  VA,  1:1,  dA  =  d5'- 

(3) :  (2)  -•  3d(!,0  :  di(V,')  x  do(Vo’)  -  V,'  x  Vo',  1:1, 

df.o  =  df.'o’ 

(4) :  (3)  -*  3  (.*  :  P(dKV,')  x  do(Vo')l  -  P(V,'  x  Vo'|,  1:1, 

d’  =  «-'■ 

The  rest  b  straightforward. 

(3) :  The  proof  b  similar  to  the  proof  of  (3)  of  Theorem  5.6.10  except 

the  maps  are  1:1. 


□ 


Following  are  two  examples  of  quasimorphbm  between  systems,  one  where  the 
quasimorphbm  b  not  1:1  and  the  other  where  it  is  1:1. 


ExampU  5.6.12: 


Consider  the  following  two  systems  S*  =  <C*,  C^>,  =  <0,  Cj>> 

see  Figure  5.2. 


s‘  =  <c\  ci.>, 

Vi*  =  {ui,  uj,  U3}, 

V6  =  {▼!.  V2,  V3}, 

Q  =0, 

C*  =  {C{,  Cl,  C]}, 

Ci  =  {<Ul,  V|>,<Ui,  V2>}, 

cl  =  {<U3,  V2>,<U3,  V3>}, 

ci  =  {<Ui,  V,>,<U2,  V2>}. 


Si  =  <Ci,  C^>, 

Vi  =  {w„  wj}, 

V6  =  {X|,  X2,  X,}, 

Ci.  =0, 

Ci  =  {Ci,  Ci,  Ci}, 

C|  =  {<W„  X,>,<W,,  X2>}, 
ci  =  {<W2,  X2>,<W2,  X3>}, 
ci  =  {<W,,  X,>,<W2,  X2>). 


Find  a  quasimorphism  from  S'  to  Si. 

Solution: 

Let  ^  =  <^,  ^o>.  ^i(ui)=W|,  ^i(u2)=W2,  ^i(u3)=w2,  ^o('^i)=*i. 


0o('^2)=X2.  ^o('^3)=X3- 

V'(S')  =  <{/i(C,l,)  I  m=l,2,3},  /i(C^)>  =  Si.  Therefore  V’  >s  a 

quasimorphism.  Since  ^(u2)  =  0i(u3),  therefore  ^  is  not  1:1. 

Example  5.6.13: 

Consider  the  following  two  systems.  S'  =  <C',  Cf>,  Si  =  <Ci,  Ci>> 
see  Figure  5.3. 


03 


S'  =  <C\  Cf>, 

v/  =  {Ui,  Uj,  U3}, 

V6  =  {V„  Vj,  Vj}, 

Cj.  =0. 

C‘  =  {C{,  ci.  C],  ci). 

Cl  =  {<U„  V,>,<U2.  V2>}. 

C]  =  {<U„  V2>,<U2.  Vi>}, 

C]  =  {<U3,  V3>}, 

C|  =  {<U2,  V3>,<U3,  V2>}. 

Find  a  quasimorphism  from  S'  to  S^. 


Si  =  <Ci,  C^>, 

V/  =  {wi,  W2,  W3}, 

Vi  =  {xi,  X2,  X3}, 

=0, 

a  =  {ci,  cl  cl  ci}, 

Cj  =  {<W,,  X,>,<W2,  X2>}, 
ci  =  {<w,,  X2>,<W2,  x,>}, 
ci  =  {<W3,  X3>}, 

Cj  =  {<Wl,  X3>,<W3,  X,>}. 


Solution: 

Let  ip  =  <<pi,  ^o>.  ^l(Ul)=W2,  ^l(U2)='^I.  ^i(U3)=W3,  ^o(vi)=X2, 
^0(^2)  =Xi,  ^0(^3) =X3- 

V'(S')  =  <{/Z(Cm)  I  m=i,2,3,4},  ^(Ci>)>  =  S^.  Therefore  ip  is  a. 

quasimorphism.  Since  J  ^i(u|,),  0o(v»)  ^  ^o('^b)»  therefore  ip  b 

1:1. 

The  genealogy  of  maps  and  correspondences  discussed  in  this  and 
preceding  sections  can  be  seen  in  the  diagram  in  Figure  5.1.  The  diagram 
shows  how  the  maps  and  correspondences  dehned  in  this  and  previous  sections 
are  related.  The  line  with  arrow  represents  that  the  map  (correspondence)  at 
the  head  of  the  arrow  was  defined  by  using  the  correspondence  at  the  tail  of 
the  arrow.  For  example,  the  /i-map  /t  was  used  to  define  the  ^-correspondence 
Ji.  In  Section  5.5  the  left  side  and  the  root  of  the  tree  were  explored.  In  this 
section  the  right  side  of  the  tree  was  explored.  It  was  shown  that  0|-map  0] 
and  00-map  <Pq  uniquely  determine  a  0i.o*^^P  0lxO*™^P  ^ixO 


uniquely  determines  a  /i-map  /i.  The  ^i-map  fi  uniquely  determines  a  Ji- 
correspondence  j!  and  the  ^map  The  V^map  V’  uniquely  determines  a 
correspondence  0.  Therefore  the  0pmap  and  ^^-map  uniquely  determine 
a  ^''Correspondence  V'-  Similarly  the  reverse  of  the  procedure  can  be  used  to 
show  that  ^-correspondence  V*  uniquely  determines  a  ^prnap  and  ^o'Uiap 

00- 

Several  properties  are  also  inherited  from  some  maps  by  others.  In 
particular,  if  0|-map  0]  and  ^Q-map  both  1:1  then  ^-correspondence  0  is 

also  1:1  correspondence.  It  is  more  surprising  though  that  the  converse  hold  as 
well,  that  is  if  0-correspondence  0  is  1:1  correspondence,  then  0]-map  0{  and 
00-map  00  maps. 


6.7  Emulation  of  Sy atoms 

In  this  chapter  we  apply  some  of  the  theoretical  developments  from  the 
previous  sections.  The  emulation  will  be  defined  and  can  be  viewed  as  an 
application  of  quasimorphism.  The  definition  of  emulation  here  is  similar  to 
the  one  used  in  [FiF82]  in  analysis  of  quotient  networks.  Examples  of  arbitrary 
system  emulation  are  given  in  details  in  the  end  of  the  section. 


05 


Definition  5.7.1: 

Let  S'  €  S[V|‘  X  Vq*]  and  S*  €  S[Vi*  x  be  two  systems.  The 

emulation  of  S'  €  S[V,*  x  Vq*)  by  S^  G  SfV,^  x  Vq*]  can  be  viewed  as  a 

two  step  procedure 

(1)  Find  a  relabeling  and  reduction  (  that  preserves  the  basic  structure) 
of  S'  using  quasimorphism. 

(2)  Find  the  subsystem  type  of  the  quasimorphism  of  S'  ^(S')  in  S^. 
Definition  5.7.2: 

Let  S'  G  S[Vj‘  X  Vo‘l  and  S^  G  S(V|*  x  be  two  systems. 

If  v5(s>)  C  a  S^,  then  it  is  called  emulation  type  a. 

If  0(S')  C  b  S^,  then  it  is  called  emulation  type  b. 

If  ^(S')  C  c  S  then  it  is  called  emulation  type  c. 

Phyeieal  implications:  Let  S‘  =  <C‘,  Cp>  =  <{Cm  |  m  =  l,2,..p},  Cp>  and 
S*  =  <C*,  C^>  =  <{C^  I  n=l,2,..q},  C|>  be  two  systems.  If 

i^(S*)  C  a  S*  then  the  system  S^  can  emulate  system  S*  as  follows.  The 
movement  of  the  data  is  accomplished  (a)  by  using  the  network 
{C*  I  n=l,2,..q}  correspondences,  and  (b)  by  using  the  feedback  or  internal 
connection  of  the  device  connected  to  both  input  and  output  of  the  network. 
This  type  of  emulation  always  exists  if  the  S*  system  is  partially  or  fully 
recirculating.  If  the  system  S*  is  partially  or  fully  recirculating  then 
3  <v,,Vy>  G  C^.  Then  using  maps  0i(vi)  =  v,,  Vv;  G  Vj*,  0o('^j)  =  ^ 

Vj  G  Vq  will  satisfy  the  necessary  conditions  for  an  emulation  of  the  type  a. 
This  however  will  result  in  a  very  poor  computational  load  balance.  Great 
improvement  in  the  computational  load  balancing  optimality  will  result  if  the 
quasimorphism  is  1:1.  Then  each  device  in  0(S*)  (the  image  of  S'  under  V')  will 


have  same  amount  of  computation  (data)  as  the  corresponding  device  in  S'. 
Physical  implication:  If  V’(S')  C  b  then  the  system  can  emulate  system 
s'.  The  movement  of  the  data  is  accomplished  by  using  the  network 
correspondences  {C^  |  n=l,2,..q}.  Thb  type  of  emulation  is  harder  to  achieve 
than  the  type  a  since  the  Cp  contribution  cannot  be  used  to  move  the  data. 
Again,  as  in  type  a,  the  load  balancing  optimality  will  greatly  increase  if  the 
quasimorphism  is  1:1.  If  the  quasimorphism  is  1:1  then  the  load  balancing  as 
well  as  utilization  in  the  image  of  S'  in  S^  will  be  identical  to  that  in  S'. 
Physical  implications:  Emulation  type  c.  Since  it  is  required  in  type  b  that 
V  €  C'  3  C*  3:  C  €„  there  may  be  some  side  effects  caused  by  C* 

emulating  the  correspondence  C^.  Moreover,  these  uncontrolled  side  effects 
will  not  allow  partitions  to  operate  independently.  That  is,  connections  that 
are  part  of  C*,  but  not  part  of  C,J„  may  be  established  when  C*  is  used  to 
emulate  C^.  This  may  or  may  not  be  a  problem.  To  analyze  this  potential 
problem,  the  type  c  was  created.  With  a  type  c  emulation,  when  the  system  S* 
emulates  system  S',  the  movement  of  the  data  is  accomplished  by  a  subset  of 
C*.  The  difference  between  type  b  and  type  c  is  that  in  type  c,  V  C,},  E  C 
3  C*  E  C*  3  =  C^.  This  requirement  will  eliminate  the  side  effects  that 

type  b  has.  More  importantly  it  means  that  V’(S')  is  actually  an  autonomous 
subsystem  of  S^.  The  autonomous  property  will  be  exploited  further  in  later 
chapters  studying  partitionability. 

The  following  two  examples  illustrate  two  types  of  emulation  where  in  the 
first  the  quasimorphism  is  not  1:1  and  the  second  has  quasimorphism  1:1. 


Example  5.7.3: 

Consider  the  following  two  systems  S'  =  <C',  Cf>,  S''  =  <C'‘,  Cp> 
see  Figure  5.2  for  S'  and  Figure  5.4  for  S*'. 

S‘  =  <C\  C^>, 

V,‘  =  {Ui,  U2,  Uj},  =  {V„  V2,  V3}, 

Ci-  =  0,  C'  =  {Cj,  C],  ci), 

Cj  =  {<U„  V,>,<Ul,  V2>},  C|  =  {<U3,  V2>,<U3.  V3>}, 

C3  =  {<“1.  Vl>.<U2>  V2>}- 
S''  =  <c^  c^>, 

V,''  =  {w„  W2,  W3},  =  {x„  X2,  X3}, 

=  0,  C''  =  {Cf,  Cl  C^h 

Cf  =  {<W,,  X,>,<Wi,  X2>},  C|  =  {<W2,  X2>,<W2,  X3>}, 

€3^  =  {<W,,  X,>,<W2,  X2>,  <W3^  W3>}. 

Find  an  emulation  from  S'  to  S''. 

Solution: 

Let  0  =  <0I,  0o>.  ^l(«l)=Wi,  ^l(U2)=W2,  ^l(U3)=W2,  ^o(vi)=Xi, 
^o{'^2)=X2>  ^o('^3)=X3,  as  in  Example  5.6.12. 

VJ(S')  =  <{fi(C^)  I  m=l,2,3},  ^(C^-)>  =  S^  (see  Figure  5.4  for  S^). 

Therefore  ^  is  a  quasimorpbism  from  S'  to  S^.  Since  ^(02)  = 
therefore  0  is  not  1:1. 

Since  C|^=C|'‘,  C^*C.C^^  therefore  ^(S‘)  C  b  S*',  and  this  is 

emulation  of  type  b,  not  1:1. 


Example  5.7.4: 


Consider  the  following  two  systems  S'  =  <C',  C^>,  S'*  =  <C*‘,  Cf> 
see  Figure  S.3  for  S'  and  Figure  S.S  for  S''. 


S'  =  <c\  C^>, 

V/  =  {u„  Uj,  U3},  =  {v„  Vj,  V3}, 

C‘  =0,0  =  {C{,  C],  Cl  C'}, 

Cj  =  {<U„  V,>,<U2,  V2>}, 

C]  =  {<U„  V2>,<U2.  V|>},  C]  =  {<U3,  V3>}. 
Ci  =  {<U2,  V3>,<U3,  V2>}. 


S''  =  <C'‘,  cjf>, 

V/'  =  {w„  W2,  W3,  wj,  =  {x„  Xj,  X3,  X4}, 

=  0,  O'*  =  {Cf,  Cl  cli 

C{‘  =  {<W,,  X,>,<W2,  X2>  ,  <W3,  X3>,<W4,  X4>}, 

C|  =  {<W,,  X2>,<W2,  X,>}  ,  <W3,  X4>,<W4,  X3>}, 

Cj^  =  {<W|,  X3>,<W2,  X4>  ,  <W3,  X,>,<W4,  X2>}. 

Find  an  emulation  from  S'  to  S''. 

Solution: 

Let  0  =  <01^  ^o>.  ^l(«l)=W2,  0l(U2)=W,,  ^i(U3)=W3,  0o(Vl)=X2. 


^o(''2)=*l'  ^o(^3)~^3  "  Example  5.6.13: 

Vl(S')  =  <{p(C^)  I  m=l,2,3,4},  p(Cf)>  =  S^  (see  Figure  5.5  for  S^). 
Therefore  V*  is  a  quasimorphism  from  S'  to  S^. 

Since  V  u^.u,,  6  V/  ^i(u  J  ^i(Ub)  and  V  0i(v J  /  0i(vb) 

therefore  ^  is  1:1.  Since  C,jcC,^  C2^CC2^  C3'CC,S  and  C4jcC3‘‘ 
therefore  V'(S')  C  b  S'',  and  thb  is  emulation  of  type  b,  1:1. 


101 


Suppose  there  is  a  quasimorphum  such  that  V'HS*)  =  and 
0'(S^)  =  and  0*  is  1:1.  First,  this  means  that  S'  =  S^  since  0*  is  1:1. 
Second,  and  more  important  from  an  engineering  point  of  vietv,  the  1:1 
guarantees  an  efficient  emulation  of  S'  by  S^.  That  is,  if  all  V|  were  connected 
to  processors  and  Vq  to  memories,  the  emulation  would  be  such  that  the 
processing  work  of  one  processor  in  S'  would  be  exactly  equal  to  the  processing 
work  of  one  processor  in  the  image  of  S'  in  S^.  Also,  the  amount  of  data  stored 
in  a  single  memory  unit  in  S'  would  be  exactly  equal  to  the  amount  of  data 
stored  in  memory  unit  in  the  image  of  S'  in  S^.  In  other  words,  the  mapping  is 
regular  in  some  sense.  Analogously,  the  load  balancing  and  utilization  in  the 
image  of  S'  in  S^  will  be  identical  to  that  in  S'.  The  quasimorphism  can  be 
used  to  map  multiple  copies  of  system  S'  into  S*,  where  0'(S')  Pi  ^(S*)  =  0  is 
a  necessary  additional  constraint.  This  will  allow  tandem  cross  checking  of 
partial  results  of  a  computation  and  therefore  can  be  used  as  an  error  detection 
mechanism  for  fault  tolerance. 

In  order  to  evaluate  the  efficiency  and  uniformity  of  the  emulation  the 
following  criteria  will  be  used. 

Definition  5.7.5: 

Let  S'’’  €  S|Vi*  X  Vq'J  and  S*’^  6  S[Vi*  x  Vq*]  be  two  systems.  Let  0 
be  a  quasimorphism  such  that  ^(S'*')  C(a,b,c)  Define: 
input  node  factor:  inf  =  max{|  ^i”'(uj  |  3:  u,  G  0i(Vi')}. 

output  node  factor:  onf  =  max{|  <^o  '(“a)  |  3^  u,  G  <^o(^o)}- 

side  effect:  se  i  yes  iff  3  C„'-'  €  C-'  3c  C  C^-i. 

For  the  detail  meaning  of  these  factors  see  the  conclusion  of  this  chapter. 


Let  S*’*  G  S[V|*  X  Vo*]  and  S®**  €  S[V|*  x  Vq*]  be  two  systems.  Let  be 
a  quasimorphbm  such  that  V'(S*’')  C(a,b,c)  S*’*'  or  V^S*-')  =  S*’*'.  The 
comparison  of  efficiency  of  different  types  of  emulation  is  shown  in  Table  5.1. 


6.8  Concluaions 


In  this  chapter  several  problems  have  been  discussed.  The  problem  of 
comparison  of  topologically  arbitrary  systems  was  rigidly  formulated  and 
analyzed  using  a  new  concept  called  quasimorphism.  Each  system  is  defined 
over  an  underlying  set  Vj  x  Vq.  The  set  of  all  systems  over  the  underlying 
over  the  underlying  V]  x  Vq  is  called  the  S-set  over  Vj  x  Vq.  Then  the  the 
problem  of  comparbon  of  systems  can  be  formulated  as  finding  relationships 
between  two  S-sets.  The  problem  is  very  complex  and  therefore  was  broken 
down  into  two  major  steps.  First  the  T-set  over  V]  x  Vq  was  defined.  T-set 
has  less  constraints  than  the  S-set  over  the  same  V(  x  Vq  and  therefore  it  b 
easier  to  analyze  relationships  between  T-sets  than  between  S-sets.  Auxiliary 
maps  ^|,  ^Q,  0ixOi  V'  were  defined  and  it  was  shown  that  0pmap  <ft\  and 

^Q-map  <^Q  uniquely  determine  V^map  f(>.  Conversely,  ^map  ^  uniquely 
determines  0pmap  and  0Q-map  ^q.  Informally,  V^map  il>  b  measure  of 
similarity  between  T-elements.  It  was  shown  that  certain  properties  of  ^map 
^1  and  00-map  0o  are  inherited  by  0-map  0.  In  particular  if  0pmap  0i  and 
0Q-map  00  are  1:1  maps  then  so  b  0-map  0.  Conversely  if  0-map  0  b  1:1 


103 


Table  5.1: 

Comparison  of  eflSciency  of  different  types  of  emulation. 


A.  0  not  1:1;  S**\  S*-'' 


Ca,  not  b 

Cb,  not  c 

Cc,  not  = 

>1 

>1 

>1 

>1 

>1 

>1 

YES 

YES 

NO 

Ca,  not  b 

Cb,  not  c 

Cc,  not  = 

1 

1 

1 

1 

1 

1 

1 

1 

YES 

YES 

NO 

NO 

then  also  ^map  ^  and  ^Q-map  maps. 

In  the  next  section,  the  relationships  between  two  S-sets  were  studied. 
Using  the  maps  ^q,  ^i^Oi  /*>  ^  T-set  domain,  new 

correspondences  /J  and  V'  were  defined  in  the  S-set  domain.  Informally,  ^  h  a. 
measure  of  similarity  between  two  S>sets.  As  expected  and  intended,  some 
behavior  of  ^q,  ^i^O'  inherited  by  /7-correspondence  p  and  V’* 

correspondence  0.  For  example  if  0i-map  0j,  and  0o*^^P  maps 

then  0-correspondence  0  is  1:1  correspondence.  Conversely  if  0-correspondence 
0  is  1:1  then  ^m&p  0}  and  0o-map  0o  are  1:1  maps.  Properties  of  0- 
correspondence  0  similar  to  the  refiexive,  symmetric,  and  transitive  properties 
of  relations  were  discussed,  in  particular  the  following  were  shown. 

Let  S\  S^,  and  be  three  systems.  The  quasimorphism  has  the  following 
properties. 

(1)  3  0  such  that  0  (S*)  =  S*. 

(2)  0'  (S‘)  =  S*  3  0*  such  that  0*  (S*)  =  S‘. 

(3)  0*  (S‘)  =  S*  and  0*  (S*)  =  S*  -*  3  0,  0  (S')  =  S*. 

Let  S',  S^,  and  S^  be  three  systems.  The  quasimorphism  1:1  has  the 
following  properties. 

(1)  3  0,  1:1  such  that  0  (S')  =  S'. 

(2)  0*  (S')  =  S*,  1:1  “*3  0*,  1:1  such  that  0*  (S*)  =  S'. 

(3)  0*  (S‘)  =  S*  1:1  and  0*  (S*)  =  S®,  1:1  3  0,1:1,  0  (S') 

=  S*. 

The  quasimorphism  measure  provides  the  necessary  theoretical  background 
for  studying  the  following  problems  of  parallel  processing. 


(ft)  Emulfttioa  of  system  by  system  S  . 

(b)  Fftult  tolerance  method  achieved  by  a  concurroit  execution  of  multiple 
copies  of  the  same  problem. 

(c)  Partitioning  of  a  system. 

Three  types  of  emulation  were  defined  based  upon  the  subsystem 
relationship  between  the  image  of  the  emulated  system  and  the  host  system. 
Several  measures  of  efficiency  of  the  emulation  based  upon  the  preservation  of 
the  computational  loading  and  other  factors  were  defined  and  the  emulation 
types  were  evaluated  on  that  basis.  Suppose  the  system  S*  consists  of 
processors  connected  to  the  V/  and  memory  units  connected  to  Vq.  If  the 
input  node  factor  =  1,  than  the  amount  of  computation  performed  in  the  host 
system  node  ^(u,)  €  Vi  is  the  same  as  the  amount  of  computation  performed 
in  the  node  u,  €  V/.  If  inf  >  1,  that  means  3  u^,Ub  €  V|*  and  w^  €  V|*  such 
that  ^(u,)  =  w^  and  ^(u|,)  =  w^.  That  implies  the  processor  connected  to  w^ 
in  must  perform  the  computation  of  the  processors  connected  to  the 

nodes  u,  and  U|,  in  If  the  output  node  factor  =  1,  than  the  amount  of  data 
stored  in  the  memory  unit  connected  to  ^o(^*)  €  ^  same  as  in  the 

memory  unit  connected  to  v,  G  in  S*.  If  onf  >  1,  than  3  G  Vq  and 
X.  G  such  that  That  implies  the  memory  unit 

connected  to  x,  in  V'(S*),  must  contain  the  data  contained  in  both  memory 
units  connected  to  v,  and  Vj,  in  S*.  Side  effects  exist  if  the  correspondence 

G  C*  that  is  used  to  emulate  the  correspondence  C„*’'  G  C*  has  the 
property  C  C^-j.  This  causes  to  move  some  additional  data  that 

the  C  did  not  move. 


In  this  chapter,  the  horizontal  composition  and  decomposition  of  single 
stage  interconnection  networks  will  be  analyzed  [SeS85].  The  general  model  of 
interconnection  networks,  defined  in  earlier,  will  be  used  in  the  analysis. 

Using  the  horizontal  composition /decomposition  the  partitionability 
property  of  interconnection  networks  will  be  defined.  Informally  the 
partitionability  property  means  that  the  network  can  be  divided  into  several 
parts  each  of  which  has  certain  degree  of  independence.  The  type  of 
partitionability  analyzed  in  this  chapter  uses  all  the  states  for  consideration  of 
partitionability  and  has  three  subtypes. 

An  algorithm  is  developed  which  will  output  one  of  the  following: 

(1)  The  network  is  not  partition  able. 

(2)  The  network  is  partitionable  into  subnetworks  with  common  control 
signals  and  the  combination  of  the  of  the  subnetworks  will  exactly 
generate  all  interconnection  patterns  of  the  original  network. 

(3)  The  network  is  partitionable  into  subnetworks  with  separate  control 
signals  and  the  combination  of  the  subnetworks  will  exactly  generate  all 
interconnection  patterns  of  the  original  network. 

(4)  The  network  is  partitionable  into  subnetworks  with  separate  control 
signals  and  the  combination  of  the  subnetworks  will  generate  a  superset 
of  interconnection  patterns  of  the  original  network. 

The  algorithm  is  network  topology  independent  and  can  be  used  to 
analyze  topologically  regular  and  irregular  single  stage  networks. 


The  partitionability  property  of  intercoDnection  networks  for  parallel 

computer  systems  is  important  for  the  following  reasons. 

(1)  If  the  network  is  partitionable  than  the  system  can  on  demand  easily 
allocate  only  a  subset  of  total  resources.  This  can  be  used  in  several 
different  ways  as  shown  below. 

(a)  A  user  can  use  only  a  small  part  of  the  machine  for  program 
development. 

(b)  In  a  multiple  user  environment  the  partitioning  provides  a 
natural  protection  among  users. 

(c)  In  a  multitasking  environment  the  partitioning  provides  a 
protection  among  independent  tasks. 

(2)  If  the  network  is  partitionable  the  fault  tolerance  of  the  system  increases 
as  follows. 

(a)  A  method  of  graceful  degradation  is  possible  by  separating  the 
faulty  section  from  the  correctly  operating  ones. 

(b)  If  in  addition  to  being  a  partitionable  network,  the  sections  are 
isomorphic,  then  an  increase  of  reliability  may  be  realized  by 
multiple  mappings  of  the  same  task  onto  the  multiple  sections 
and  tandem  cross  checking  of  partial  results. 

(c)  It  is  possible  to  construct  a  fault  tolerant  network  using  a 
partitionable  network  as  a  core. 

(3)  If  the  network  is  partitionable,  then  there  is  an  efficient  implementation 
in  terms  of  hardware  and  control.  The  network  can  be  implemented  as 
a  set  of  network  components  each  with  its  own  set  of  inputs  and 
outputs.  The  data  path  layout  and  under  some  conditions  also  the 


controls  layout  is  simplified  on  VLSI  substrate  or  on  a  printed  circuit 
board  (PCB). 


6.2  Overview 


In  Section  6.3  the  problem  discussed  in  thb  chapter  is  informally  defined. 
In  Section  6.4  the  previous  work  on  partitionability  is  briefiy  described.  In 
Section  6.5  some  basic  concepts  are  defined.  In  Sectbn  6.6  the  horizontal 
composition  and  decomposition  of  single  stage  interconnection  networks  are 
formally  defined  and  analyzed.  In  Section  6.7  an  algorithm  is  presented  and 
proven  for  correctness  that  accepts  as  an  input  a  topologically  arbitrary 
interconnection  network  and  outputs  one  of  following  four  outcomes.  The 
network  is  not  partitionable,  or  the  network  is  partitionable  in  one  of  the  three 
types. 


6.3  Problem  Statement 


In  this  section  the  problem  of  partitionability  of  single  stage 
interconnection  networks  will  be  analyzed  [SeS85].  There  is  a  large  amount  of 
work  done  on  this  subject  for  certain  class  of  interconnection  networks,  namely 


110 


i 

i 

topologically  regular  networks  [Gok76,  GoL73,  SieSO,  UppSl].  The  work  here 
is  different  in  two  respects  from  the  previous  studies.  First,  the  topology  under 
discussion  here  b  completely  unrestricted  and  the  results  apply  to  the  regular 
as  well  as  irregular  interconnection  networks.  Second,  the  set  of  states  used  in 
the  consideration  of  partitionability  here  includes  all  the  states  of  the  networks, 
whereas  the  previous  work  used  only  a  subset  of  the  states  (this  will  be 
discussed  more  in  the  future  chapters.)  In  our  work  the  partitionability  will  be 
defined  and  three  different  types  of  partitionability  will  be  recognized.  Then  an 
algorithm  which  accepts  as  an  input  a  topologically  arbitrary  interconnection 
network  and  outputs  one  of  the  four  possible  outputs  will  be  presented.  The 
outputs  are  as  follows;  (a)  the  network  is  not  partitionable,  (b)  the  network  is 
partitionable  into  two  networks  with  dependent  controls,  (c)  the  network  is 
partitionable  into  two  networks  with  independent  controls  where  the 

combination  produces  the  original  network  exactly,  and  (d)  the  network  is 
partitionable  into  two  networks  with  independent  controls  where  the 

combination  produces  a  superset  of  states  of  the  original  network. 


6.4  Previous  Work 


The  partitionability  of  topologically  regular  network  has  been  studied 
extensively  in  the  literature.  It  was  shown  in  [SieSOj  that  single  stage  and 
multistage  Cube  networks  are  partitionable,  as  are  PM2I  and  ADM.  It  was 
also  shown  in  [Sie80|  that  the  Illiac  and  ShufUe-Exchange  are  not  partitionable. 


The  analysb  in  [SieSO]  was  based  upon  the  cycle  structure  of  the  permutations 
admissible  by  the  network  under  analysis.  In  [UppSl]  the  partitionability  of 
regular  SW  banyans  was  discussed,  and  in  [Gok76,  GoL73]  the  partitionability 
of  banyans  networks  was  shown.  All  these  networks  are  topologically  regular 
and  partitionability  of  arbitrary  networks  was  not  studied  in  the  literature. 
The  partitionability  discussed  in  thb  chapter  b  different  from  the  type 
dbcussed  in  the  previous  work  in  two  respects:  it  considers  the  participation  of 
all  the  states  of  the  network,  where  the  type  studied  previously  considered  only 
a  subset  of  the  states  of  the  network,  and  it  b  applicable  to  networks  with 
arbitrary,  regular  and  irregular  topology. 


6.8  Composition  and  Decomposition  of  Networks 


This  section  describes  a  “horizontar  composition  and  decomposition  of 
single  stage  networks.  The  discussion  here  is  presented  for  the  composition  of 
two  networks  into  one  and  the  decomposition  of  one  network  into  two. 
However,  it  can  be  generalized  into  the  composition  of  n  networks  into  one  and 
decomposition  of  one  network  into  n,  n  >  2.  What  is  meant  by  the  horizontal 
eompoaiiton  of  two  networks  K*  and  K*  b  that  Vj*  PI  V|*  =  0  and 
n  =  0.  Similarly,  the  horizontal  deeompooition  of  K  into  two  networks 
K*  and  will  result  in  V/  H  Vf  =  0  and  Vq  H  =  0.  Two  types  of 
composition  (decomposition)  are  described.  One,  the  <r- composition 
(decomposition)  corresponds  to  the  physical  situation  where  the  controb  of  the 
individual  subnetworks  of  the  network  are  independent.  The  other  type  b  the 
r -composition  (decomposition),  which  corresponds  to  the  physical  situation 
where  the  controb  of  the  individual  subnetworks  of  the  network  are  dependent 
upon  one  another. 

This  section  conceptually  consbts  of  two  parts.  In  part  one  the  definition 
of  the  (T-composition  is  given  and  some  of  its  basic  properties  are  presented.  In 
part  two  the  definition  of  the  r-composition  is  given  and  its  properties  are 
described. 

Definition  6.6.1: 

Let  K‘  €  K[V,‘  x  V^j,  K»  =  <C'>,  and  K*  6  KfV,*  x  V^], 

K*  =  <C*>,  be  two  networks  such  that;  (V/  U  Vq)  H 
(Vj*  U  V^)  =  0.  Define  <r-map  as  follows:  K*  <r  K*  =  <C‘>  a  <C*> 


^  <{Cp‘UC2|  c;€CSc2€C2}>. 

This  describes  the  composition  of  two  networks  where  the  controb  of  the  two 
networks  &re  independent  from  one  another.  The  Lemmas  and  Theorems  6.6.2 
to  6.6.4  dbcuss  the  properties  of  the  (r>map  composition  of  networks. 

Ltmma  6.6.2: 

Let  K*  €  K[Vi‘  x  V^|  and  K*  6  K(V*  x  Vq)  be  two  networks  such  that; 
(V/  U  Vi)  n  (Vi*  U  Vi)  =  0.  Then  K‘  <r  K*  =  K*  <r  K‘. 


Proof: 


Obvious  from  the  de&nition  of  <r-map  and  commutativity  property  of  set 


union. 


Theorem  6.6.3: 


Proof: 


Let  K*€K(V,*xVi],  K‘  =  <C*>,  and  K*eK[V,*x V^j,  K*  =  <C*>, 
be  two  networks  such  that:  (V|*  U  V^)  H  (V*  U  Vq)  =  0.  Then 
K‘  <r  K*  €  Kl(Vi»  U  V,*)  x  (Vi  U  Vg)). 

(1) :  Let  {C;  U  C*  |  C}  G  C‘,  C*  G  C*}  =  C®,  let  C»  G  C\  Let 

C((V,»  U  V,*)  X  (Vi  U  Vi)]  =  C*. 

(2) :  Show  C*  C  C*. 

(2.1) :  Clearly  C*  G  Pl(V|‘  U  Vj*)  x  a  (Vi  U  Vi)].  Must  show 

nondestructivity. 

(2.1.1) :  <u.,Ui,>,  <Uc,Ud>GC*  -♦  three  cases. 


(2.1.2) :  <u.,Ub>,  <Uc,Uj>  €  Cp*,  Cp*  €  C*  Uj. 

(2.1.3) :  <Uj,Ub>,  <Ue,u<|>  €  C*,  C?  6  C*  -*  U|,  Uj. 

(2.1.4) :  <u.,Ub>  €  Cp’,  Cp*  6  and  <Uj,ua>  6  C*,  C*  6  C* 

(v,*uvi)n  (vfuvg)  =  0  -►vgnvg 

=  0  ^  Ub  uj. 

(2.1.5) :  (2.1.2),  (2.1.3),  and  (2.1.4)  -♦  C»  G  C*  C*  C  C*. 

(3) :  Show  s(C*)  =  V,*  U  V,^ 

(3.1) :  s(C»)  =  s({Cp»UC*  I  Cp»GC*,  C*  6  C^})  = 

{s(Cp‘)  U  s(C2)  I  Cp»  G  C‘,  C2  G  C2}  =  {s(Cp*)  |  Cp‘  G  C*}  U 
{s(C2)  I  C2  G  C2}  =  s(C»)  U  s(C2)  =  V,‘  U  V,2 

(4) :  Show  d(C3)  =  Vg  U  Vg. 

(4.1) :  Similar  to  (3.1)  except  replace  the  s  set  by  the  d  set. 

(5) :  Show  I  C®|  >  2. 

(5.1) :  I  C’l  =1  {Cp‘  U  C*  3:  Cp‘  G  C‘,  C*  eC^}\. 

(5.2) :  Cp  U  C*  ^  C,‘  U  C*,  p  /  s  or  r  t  ^  all  C®  are  distinct. 

(5.3) :  (5.1),  (5.2)  -♦  |C®|  =  |C‘|  •  |C^  >2-2=4. 

□ 

Lemma  6.6.4’ 

Let  K‘  G  KfV,'  x  Vg],  K*  G  K[V,2  x  Vg],  and  K*  G  K[V,®  x  Vg]  be 
three  networks  such  that  (V|*  (J  Vg)  D  (V/*  U  Vg)  =  0,  a  /  b,  a,b  = 
1,2,3,  then  (K*  or  K®)  (T  K®  =  K'  <t  (K®  a  K®). 


115 


Proof: 


Obvious  from  the  definition  of  cr-map  and  the  associativity  property  of 
set  union. 


The  next  three  definitions  are  introducing  the  technical  nomenclature  used  in 
this  chapter. 

Definition  6.6.5: 

Let  K€  K[V,  x  Vq]  be  a  network.  Let  {K',K2,...,K"  |  K'eKfV/x  V^)} 
be  a  set  of  networks  such  that:  K  =  K*<tK^<t  •  •  •  K".  Then 

(1)  K*  •  •  •  K"  is  called  a  a -decomposition  of  K. 

(2)  (K\K^..,K"}  is  called  a  a -decomposition  set  of  K. 

(3)  K'  is  called  a  (r- decomposition  element  of  K. 

(4)  K  is  the  (T  -composition  of  K*  K*  <t  •  •  •  K". 

Definition  6.6.6: 

Let  K  €  K[V|  x  Vq]  be  a  network.  If  the  only  possible 
(7'decomposition  is  K  =  K*  then  K  is  called  a  a -prime  network. 

Definition  6.6.7: 

Let  K  €  K[Vi  x  Vq]  be  a  network  and  let  K  =  K*.  Then  K*  is  called 
the  trivial  a  -decomposition  of  K. 

Lemma  6.6.8: 

Let  K  6  KfVf  x  Vq]  be  a  network.  Then  K  has  a  cr-decomposition. 


Let  K  =  K*  be  the  trivial  <r-decomposition  of  K. 


□ 


Definition  6.6.9: 

Let  K  €  K(V|  x  Vq]  be  a  network.  Let  K  =  K*  <t  K*  <t  •  •  •  K"  be  a  <r- 
composition,  where  Vj,  is  a  tr-prime  network.  Then 

K*  <r  K*  <y  •  •  •  K"  is  called  a  <r-eompoaition  prime  of  K. 

n  n 

Notice  that  this  implies  V|  =  UVj  and  Vq  =  UVq. 

i=l  i=l 

Theorems  6.6.10  to  6.6.12  discuss  some  properties  of  the  (r-decomposition  of 
networks. 

Theorem  6.6.10: 

Let  K  G  K[Vi  x  Vq],  K  =  <C>,  be  a  network.  Let  K  = 
K*  a  K*  <r  •  •  •  K"  be  any  ^-decomposition.  Then:  n  <  log2|  C| . 

Proof: 

Let  Rj  =  <Cj>. 

(1) :  is  a  network  ^  ^  2. 

(2) :  |C|  =n  I  Ci|  >2". 

j=« 

(3) ;  n  =  logj  2"  <  logj  |  C| . 

□ 


This  can  be  used  as  an  upper  bound  on  number  of  networks  in  a  tr- 
decomposition  set. 


Theorem  6.6.11: 


Let  K  €  K[V|  x  Vq],  K  =  <C>,  be  a  network. 

(1)  If  K  has  a  nontrivial  (T-decomposition  then  |C|  is  not  a  prime 
number. 

(2)  If  |C  I  is  a  prime  number  then  K  does  not  have  a  nontrivial  a- 
decomposition. 


Proof: 


Follows  from  proof  of  Theorem  6.6.10. 


This  counting  principle  introduced  above  can  be  used  as  a  necessary  condition 
on  a  <7*decomposition  of  a  network. 

Theorem  6.6.12: 

Let  K‘ €  K[V,‘ X  V^l,  K*  =  <C»>,  and  K*  6  K[V,2  x  Vg], 
K*  =  <C^>,  be  two  networks  such  that:  (Vj*  U  Vq)  D 
(V|*  U  Vq)  =  0.  Let  =  K‘  <T  K*  be  a  (r-composition. 

(1)  If  0c  €  C*  then  K*  Cc  K*  where  &q  is  the  correspondence 
consisting  of  no  edges,  i.e.,  no  connections  between  the  set  of 
inputs  and  the  set  of  outputs. 

(2)  If  0c  ^  C*  then  K*  Cb  K®,  but  not  K‘  Cc  K®. 

(3)  If  0c  €  C‘  then  K®  Cc  K®. 

(4)  If  0c  ^  C‘  then  K®  Cb  K®,  but  not  K®  Cc  K®. 


\/ci€C»,  acjec’ 


Cate  1:  Show  K*  Cc  K*. 

(1) :  K’  =  K‘  <r  K*  -• 

3:  C»  =  Ci  U  C* 

(2) :  (1)  and  0c€C2  -*\/ciGC*  3C»  G  C»  3:  =  Cj 

-4  K*  Cc  K*. 

Cate  2:  Show  K‘  Cb  K®  but  not  K*  Cc  K* 

(1) :  Same  as  Case  1. 

(2) :  (1)  and  0c  ^  C*  -►  (VC^eC  3C»  6  C»  3:  C  C*) 

and  (  V  Ci  €  C*  3Cj  G  3;  =  Cj)  K‘  Cb  K®  and 

not  K*  Cc  K* 

Cate  S  and  d:  Same  as  Case  1  and  2  by  the  commutativity  of  the  a- 
composition  (Lemma  6.6.2). 

□ 

In  this  second  part  of  this  section,  the  r-composition  and  decomposition  of 
two  networks  will  be  discussed.  This  differs  from  the  (recomposition 
(decomposition)  as  follows.  In  the  (T-composition,  the  two  networks  keep 
independent  controls,  that  is  if  C,^  is  selected  in  an  arbitrary 
correspondence  C^  can  be  selected  in  K^.  In  the  r-composition,  the  two 
networks  have  joint  control,  that  b  if  C^  is  selected  in  K*,  the  corresponding 
C}  must  be  selected  in  K^. 


Definition  6.6. IS: 


Let  K»€KtVi»xV^I,  K»  =  <C‘>,  and  K^eKfVi^x V^J,  K^  =  <C^>, 
be  two  networks  such  that: 

(a)  (V,'  u  Vi)  n  tyf  U  Vi)  =  0.  and  (b)  |  C'|  =  |  C*| . 

Define  -map  as  follows: 

(1)  Define  o;  C^-*C*,  map  1:1  and  onto. 

(2)  K‘t„K*=  <C‘>r<,<C*>*  <(C;U  C?|a(C;)=C?, 

Cp‘eC‘,  C2€C2}>. 

This  describes  the  composition  of  two  networks  where  the  controls  are 
dependent  in  the  sense  that  choosing  a  Cp  in  means  Qt(Cp)  must  be  selected 
in  Thus,  the  or  map  exactly  specifies  how  the  controls  are  dependent.  The 
basic  difference  between  the  <r-map  and  r^*map  is  as  follows.  Suppose  K*  = 
<C*>  and  K*  =  <C*>.  If  K*  =  K*  <r  K^  K®  =  <C®>,  then  (a)  |  C®|  = 
|C^|  ‘IC®!  and  (b)  Cp  is  a  subset  of  |  C®|  correspondences  in  C*.  If 
K®  =  K*  Tff  K®  then  (a)  [  C®|  =  |  C*j  =  j  C®|  and  (b)  Cp  is  a  subset  of  one 
correspondence  in  C®,  specifically  Cp  U  Qr(Cp). 

This  describes  the  composition  of  two  networks  where  the  controls  are 
dependent  in  the  sense  that  choosing  a  Cp  in  C*  means  Qr(Cp)  must  be  selected 
in  C®.  Thus,  the  a  map  exactly  specifies  how  the  controls  are  dependent.  The 
basic  difference  between  the  er-map  and  vmap  is  as  follows.  Suppose  = 
<C*>  andK®  =  <C®>. 

If  K*  =  K‘  (T  K®,  K®  =  <C®>,  then 

(a)  |C®|  =  |C‘l  ‘IC®!  and 

(b)  Cp  is  a  subset  of  |  C®|  correspondences  in  C®. 

If  K®  =  K‘  K®  then 


120 


(a)  |C»|  =  |C‘|  =  \  C^\  and 

(b)  Cp  is  a  subset  of  one  correspondence  in  C^,  specifically  Cp  U  a(Cp). 

The  definitions  6.6.14  to  6.6.16  are  providing  some  nomenclature  involving  the 
r-composition  and  decomposition  of  networks. 

Definition  6.6. 1^: 

Let  K€K[V,  X  Vol  be  a  network.  Let  {K\K.^ . K"  |  K‘gK[V,‘ x  V^)} 

be  a  set  of  networks  such  that:  K  =K‘ •  •  •  K“.  Then 

(1)  K*  •  •  •  K"  is  called  a  t- decomposition  of  K. 

(2)  {K‘.  K^...,K"}  is  called  a  t- decomposition  set  of  K. 

(3)  K'  is  called  a  t- decomposition  element  of  K. 

(4)  K  is  the  recomposition  of  K*  K*  •  •  •  K". 

Definition  6.6.15: 

Let  K  €  K[Vi  x  Vq],  K  =  <C>,  be  a  network.  K  is  a  prime  network 
iff  K  cannot  be  decomposed  as  K  Dc  K*  (r  K^. 

Definition  6.6.16: 

Let  K  G  K[Vi  x  Vq),  K  =  <C>,  be  a  network.  If  there  exist 
K‘  e  KfV,»  X  V^],  K‘  =  <C'>,  and  6  KfV,2  x  V^),  K*  =  <C2>, 
two  prime  networks  such  that:  (1)  V|*  UV*  =  V|,  and  (2) 

UV3  =  Vo.  then: 

(1)  If  K*  =  K,  then  K  is  a  r-partitionahle  network. 

(2)  If  K*  <T  =  K,  then  K  is  a  strictly  a  -partitionahle  network. 

(3)  If  K*  <T  K  and  K*  <7  K*  Dc  K,  then  K  is  a  a- partitionahle 
network. 


Note  that  strictly  (r-partitionable  implies:  |C|  =  1^1* 

C  =  {C,*  U  Cf  I  C,*  6  C*,  C*  €  C*}.  Id  contrast  <r-partitionable  implies: 
|C|  <|C*|  -IC^I  andCC  {C.*UC/|  C.‘ €  C*,  C*  6  C^}. 


If  K  is  a  r-partitionable  network  then  it  is  also  a  4r>partitionable.  It  is  not 
strictly  <r-partitionable  because  it  is  strictly  ^-partition  able  only  if 
I  C*(  •  I  C*|  =  I  C|  and  it  is  r-partitionable  only  if  |  C*|  =  |  C^|  =  |  Cj ,  which 
implies  |  C*|  =  |  C®j  =  |  C|  =1;  however,  J  C‘| ,  j  C*j ,  |  C|  >2,  by  Definition 
4.6.1.  Also  note  that  if  there  exists  a  o-prime  composition  of  K,  then  K  b  a 
strictly  ^-partitionable  network. 


6.7  Partitioiiability  Algorithm 


In  this  section  an  algorithm  is  presented  that  has  an  input  any  general 
network  (with  an  arbitrary  topological  structure)  and  which  produces  one  of 
four  possible  outputs. 

(1)  The  network  is  not  partition  able. 

(2)  The  network  is  r-partitionable. 

(3)  The  network  is  strictly  o-partitionable. 

(4)  The  network  is  o-partitionable  . 

The  engineering  interpretation  of  the  four  outputs  is  as  follows: 

(1)  The  network  is  not  partitionable  into  disjoint  subnetworks. 


(2)  The  network  is  partitionable  into  subnetworks  with  common  control 
signab  that  are  dependent  upon  one  another  and  the  combination  of  the 
subnetworks  will  exactly  generate  all  interconnection  patterns  of  the 
original  network. 

(3)  The  network  is  partitionable  into  subnetworks  with  independent  control 
signals  and  the  combination  of  the  subnetworks  will  exactly  generate  all 
interconnection  patterns  of  the  original  network. 

(4)  The  network  is  partitionable  into  subnetworks  with  independent  control 
signals  and  the  combination  of  the  subnetworks  will  generate  a  superset 
of  interconnection  patterns  of  the  original  network. 

The  algorithm  can  be  programmed  on  a  computer  and  if  the  output  of  the 
algorithm  is  (2)  or  (3)  then  it  will  produce  a  more  efficient  implementation  of 
the  network  in  terms  of  data  path  hardware  and  possibly  control 
implementation.  In  case  (4),  even  though  a  superset  of  the  states  of  the 
original  network  is  obtained,  the  implementation  produced  by  the  algorithm 

will  be  efficient  in  most  instances.  The  following  definitions  are  needed  to 

discuss  the  algorithm  and  prove  its  correctness. 

Definition  6.7.1: 

Let  K  €  K[Vi  x  Vq],  K  =  <C>.  Let  €  C  and  <Vj,Vj,>  G  C,„  be 
an  edge  (directed).  Denote  the  undirected  arc  associated  with  the 
directed  edge  of  <v,^,Vb>  by  <v,,v^>.  Let  G[Vi  x  Vq)  = 

I  <v^,Vb>  G  C„,  VC„  €  C}.  Then  G[Vi  x  Vq]  b  the 

underlying  undirected  graph  of  K. 


Definition  6.7.2: 


Let  G[V|  X  VqI  be  the  underlying  undirected  graph  of  K  €  K[V|  x  Vq]. 
Then  the  connected  subgraphs  of  G[V|  x  VqI  are  called  eomponente  of 
G{Vj  X  Vo\. 

Notation:  Components  are  denoted  by  B*,  Denote  the  vertices 

associated  with  B'  by  Vf  and  V^,  Vf  C  Vj,  C  Vq.  In  a  component  B'  there 
exists  a  path  from  each  node  to  every  other  node  and  there  is  no  path  between 
any  two  nodes  from  different  components.  Clearly  G[V|  x  Vq)  =  U  B*', 

r 

UV,'=Vi,  andUV6=Vo. 
r  r 

Definition  6.7.3: 

Let  G(Vi  X  Vq]  be  the  underlying  graph  of  K  G  K{Vj  x  Vq],  K  = 
<C>.  Let  Cm  G  C  and  let  B'  be  a  component  of  G[Vi  x  Vq].  Define 
the  projection  p  of  onto  as  follows: 
p(Cm,B')  ^  {<v„Vb>  G  Cm|  <^>  €  B^}. 

Lemma  6.7.4: 

Let  G[V|  X  Vq]  be  the  underlying  graph  of  K  G  KfVj  x  Vq],  K  = 
<C>.  Let  Cm  €  C  and  let  {B*,B^,...,B"}  be  the  set  of  all  components 
of  G(V,  X  Vol.  Then  C„  =  p(C„,B')  Up(C„.B*)  U  '  •  P(C„,B"). 

Proof: 

(1):  Show  p(Cm,B‘)  n  p(Cm,Bi)  0  B*  =  ». 

(1.1) ;  p(Cm,B‘)  n  p(Cm,Bi)  ^  0 

<v.,Vb>  G  p(Cm,B0,  <v.,Vb>  G  p(Cm,Bj). 

(1.2) :  <v„Vb>  G  p(Cm,B')  <v»,Vb>  G  Cm,  <^>  G  B‘. 


124 


(1.3):  <v..Vb>  €  p(C„,B»)  -♦  <v.,Vb>eC„,  <v.,Vb>  €  ». 

(1-4):  <v„Vb>€B',  <^^>68*  and  G[V|X  VqJ  =  UB' 

r 

-♦B‘=Bi. 

(2) :  Show  C„  =  U  P(C„,B‘). 

i 

(2.1) ;  Show  C„  C  y  P(C„,B‘). 

1 

(2.1.1) :  <v^,Vb>  G  C„  €  G[Vi  x  Vq]  -* 

3  B*,  <>^>  G  Bi  <v.,Vb>  €  p(C^,Bj)  -♦ 

<v„Vb>  G  Up(C„,B‘‘)- 

1 

(2.2):  Show  C„  2  U  P(C„,B‘). 

I 

(2.2.1) :  <v„Vb>  G  U  P(C„,B‘) 

I 

3  B',  <v.,vt>  6  p(C„,B')  -*  <v.,vi>  e  C„. 

(3) :  (1)  and  (2)  -*  C„  =  U  p(C„,B‘). 


Definition  6.7.5: 


Let  G{V|  X  Vq]  be  the  underlying  undirected  graph  of  K  €  K[V|  x  Vq), 
K  =  <C>.  Let  B'  be  a  component  of  G[Vj  x  Vq).  Define  the  residue 
set  modulo  B*  as  follows:  r(B‘)  ^  {p(Cb,B')  I  VCbGC}. 


The  Theorems  and  Lemmas  6.7.6  to  6.7.13  are  essential  components  of  the 
proof  of  the  algorithm  presented  later.  They  discuss  the  conditions  of  existence 
and  properties  of  the  component  networks,  which  are  the  parts  into  which  a 
network  is  decomposed  if  a  decomposition  exist. 


•  A 

S*  ' 


Theorem  6.7.6: 


Let  B'  be  a  component  of  the  underlying  graph  G[V|  x  Vq]  of 
K  €  K[Vj  X  Vq],  K  =  <C>.  Let  r(B')  be  the  residue  set  modulo  B',  B' 
over  V/  X  V^.  If  |  r(B0|  >  2  then  <r(Bn>  6  K[V{  x  V^).  <r(B0> 
is  called  a  component  network  of  K  denoted  by  K(B'’). 

Proof: 

(1) :  Show  C,  e  r(B0  “♦  C,  €  C(Vf  x  V^]. 

(1.1) :  C.  €  r(B^)  =  {p(C.,B0  3:  C,  6  C} 

3  C,  €  C,  C,  =  p(C„B0  C,  G  CIVf  x  V^j. 

(2) :  Show  s(r(B^))  =  Vf. 

(2.1) :  Show  s({p(C^,B0  3:  C.  G  C})  C  Vf. 

u.  €  s({p(C„B0  3:  C.  G  C})  3  CbGC,  <u.,Ub>  G  C^, 

<u;;u;>  g  b'  ->  u»  g  v/. 

(2.2) :  Show  s({p(C„B0  3:  C,  G  C})  D  Y{. 

Uj  G  V/  <u„Ub>  G  B*^  3  Cj,  G  C,  <u„U|,>  G  ^ 

<u„Ub>  G  p(Cb,B')  -♦ 

u.  G  s((p(C„B0  3:  C.  G  C})  =  s(r(B0). 

(2.3) :  (2.1),  (2.2)  s(r(B^))  =  Vf. 

(3) :  Show  d(r(B'))  =  V^. 

Same  as  (2)  except  replace  the  s  set  by  the  d  set. 

(4) :  Show  |r(B0|  >  2. 

By  Theorem  hypothesis. 


(5):  (1),  (2),  (3)  and  (4)  -♦  <r(B0>  G  KfV/  x  V^). 


120 


□ 

Given  an  arbitrary  network  it  is  possible  that  |  r(B')|  =  1  for  some  B';  that  is, 
p(C„B')  =  p(Cb,B'),  \/C^,Cb  G  C.  Then  r(B')  does  not  constitute  a 

reconfigurable  network  as  defined.  To  handle  this  case  from  an  engineering 
point  of  view,  do  the  following.  If  a  network  contains  a  such  B',  that  part  of 
the  network  is  constant,  that  is,  it  has  a  single  state  only.  So  to  remove  this 
constant  part  from  the  network  K  =  <C>  do  the  following.  (1)  Construct 
separately  the  constant  part  r(B'),  VB' 3:  IrfB*^)!  =  1,  as  a  set  of 
nonreconfigurable  links.  (2)  K'  =  <{C„  -  <v^,Vb>  |  €  C, 

\f  <v^,Vj,>  G  B',  \/B'  3  I  r(B'')|  =  1}>.  then  contains  only  the  block 

(blocks)  where  |  r(B')|  >  1.  In  the  following  it  is  assumed  that  the  constant 
blocks  of  the  network  have  been  removed  already. 

If  G[VjxVq]=B*,  then  K  =  <r(B*)>.  In  this  case,  K  is  a  or-prime 
network  and  is  not  partition  able.  The  following  Lemmas  and  Theorems  are 
shown  for  the  case  of  G[Vi  x  Vq)  having  two  components,  B*  and  B*,  for 
reasons  of  simplicity.  They  are  all  applicable  to  the  case  of  B‘,B*,...,B",  n  >  2. 
Lemma  6.7.7: 

Let  {B*,  B^}  be  the  set  of  components  of  the  underlying  graph 
G(V,  X  Vol  of  KGK[V,  X  Vol,  K  =  <C>.  Let  |  r(B‘)|  =  |  C| ,  Vi.  Then 
3  such  that  if  <C’>  =  K(B‘)  K(B2)  then  C  C  C®. 


(1):  |r(Bi)|  =|C|,Vi. 

This  is  necessary  and  sufficient  condition  for  the  exbtence  of  a. 
p(C„B')  /  p(Cy,B0,  V C„Cy  eC,x  Tty,  V r. 


•“  V 


Proof: 


(2) :  <C*>  =  K(B')  r,  K(B*)  -♦  C*  =  {p(C.,B')  U  p(Ct.B*) 

I  o(p(C.,B'))  =  p(Ck,B*).  C.  €  C,  Cfc  €  C}. 

(3) :  L«t  a:  {p(C„B')|  C.  €  C}  -  (p(Ci,B*)|  Ct  6  C). 

«(P(C.,B'))  =  P(C.,B*). 

(4) :  C,€C  -*  C.  =  p(C.,B')  Up(C.,B’). 

(5) :  (2),  (3)  and  (4)  -»  C,  €  C’  -*  C  C  C*. 

□ 

Lemma  6.7.8: 

Let  be  the  set  of  components  of  the  underlying  graph 

G(V,  X  Vol  of  K€K[V,  X  Vq),  K  =  <C>.  Let  |  r(B')|  =  |  C| ,  Vi.  Then 
3  such  that  if  <C3>  =  K(B‘)  K(B2)  then  C»  C  C. 

Proof: 

(1) ;  (1),  (2),  and  (3)  from  proof  of  Lemma  6.7.7. 

(2) :  C,  €C»  C,  =  p(C„B')  U  p(C„B^ 

(3) :  (1)  and  (2)  -♦  C,  6  C’  -♦  C*  C  C. 

□ 

Theorem  6.7.9: 

Let  {B^  B^}  be  the  set  of  components  of  the  underlying  graph 
G[V,  X  VolofK€K[V,xVol,K  =  <C>.  Let|r(B')|  =|C|,Vi.  Then 
3  Ta  such  that  K(B‘)  K{B^)  =  K. 


Let  q:  {p(C„,B')I  C„  €  C)  {p(C.,B’)|  C.  €  C), 
<»(p(C„,B'))  =  P(C„B*).  Let  K(B')  r.  K(B*)  =  <C*>. 

Lemma  6.7.7  C  C  C*. 

Lemma  6.7.8  ^  C*  £  C. 

(2),  and  (3)  -*  C»  =  C. 

Theorem  4.6.8  C*  =  C  -♦  K(B‘)  K(B2)  =  K. 


Lemma  6.7.10: 


Let  {B^  B^}  be  the  set  of  components  of  the  underlying  graph 
G(V,  xVoI  of  K€K(V,xVo],  K  =  <C>.  Let 
K(B»)  <r  K(B*)  =  <C«>.  Then  C  C  C». 


Proof: 


(1):  C,  e  C  -•  C„  =  p(C„,B')  Up(C„,B'). 

(S):  <C»>  =  K(B')»  K(B»)  =  <{p(C.,B')  |  C.€C»  e 

<{p(C|„B’)|  C|,eC»  -»  C„6C>  -»  CCC>. 


Theorem  6.7.11: 


Let  {B^,B^}  be  the  set  of  components  of  the  underlying  graph 
G[V,  X  Vo)  of  K  €  KfV,  X  VqI,  K  =  <C>.  Let  K(B*)  a  K(B*)  = 
<C»>.  Let  I  r(B‘)|  •  |  r(B*)|  =  |C|.  Then  K(B‘)  <t  K(B2)  =  K. 


120 


Proof: 

(1) :  By  Lemma  6.7.10  C  C 

By  Theorem  hypothesb  j  r(B*)|  •  j  r(B*)|  =  |  C*(  =  jCj  — ♦  C 
=  C». 

(2) :  By  Theorem  4.6.8  and  (1)  -♦  K(B‘)  <t  K(B*)  =  K. 


□ 


Theorem  6.7.12: 

Let  {B\B^}  be  the  set  of  components  of  the  underlying  graph 
G[V,  X  Vol  of  K  €  K[Vi  x  Vo),  K  =  <C>.  Let  K(B‘)  <r  K{B^)  = 
<C*>.  Let  |r(B‘)|  •  |  r(B2)|  >  |C|.  Then  K(B‘)  a  K(B*)  Dc  K  and 
K(B»)  a  K(B*)  K. 

Proof: 

(1) :  By  Lemma  6.7.10  C  C  C*. 

Theorem  hypothesis  |r(B‘)j  •  |  r(B*)|  =  |  C*j  >  |C|  C 
C  C®. 

(2) :  By  Theorem  4.6.7  and  (1)  K(B‘)  a  K(B*)  Dc  K,  and 

Theorem  4.6.8  and  (1)  K{B‘)  a  K(B*)  yt  k. 

□ 

Definition  6.7.  IS: 

If  B*,B*  ...,B"  are  the  components  of  G[Vi  x  VqI,  where  G[V|  x  VqI  is  the 
underlying  graph  of  K,  then  K(B*),  K(B*),...,K(B")  is  a  prime 
decomposition  of  K. 


The  algorithm  b  presented  below.  The  input  b  an  arbitrary  network 
K  6  K[V|  X  Vq],  K  =  <C>,  with  the  constant  part  removed.  The  output  b 
one  of  (1)  K  b  not  partition  able,  (2)  K  b  f'-partitionable,  (3)  K  b  strictly  <r- 
partitionable,  (4)  K  b  <r-partitionable.  In  cases  (2),  (3),  and  (4)  the  algorithm 
also  produces  the  component  networks  K(B*),  K(B^),...,K(B”),  in  step  (7). 

Algorithm  : 

Input:  K  €  K[V,  x  Vq],  K  =  <C>. 

Output:  (1):  Kb  not  partitionable, 
or  (2):  K  b  r-partitionable, 
or  (3):  K  b  strictly  tr-partitionable, 
or  (4):  K  b  <r-partitionable. 

(1)  Construct  the  underlying  graph  G[V|  x  Vq]  of  K. 

(2)  Find  components  B\B*,...,B"  of  G[V|  x  Vq). 

(3)  If  (n=l)  return  (1). 

(4)  Find  p(C„,B'),  \/C„  e  C,  i  =  1,2,. ..,n. 

(5)  Fm<lr(B')  =  {p(C„,B')|  VC„6C),  i  =  1,2 . i>. 

(6)  Construct  K(B')  =  <r(B')>,  i  =  1,2,.. .,D. 

(7)  If(|r(B')|  =|C|,r  =  l,2 . n) 

then  return  (2). 

(8)  It  (n  I  r(B')  I )  =  |Cl 

i=l 


(9) 


then  return  (3). 
Ebe  return  (4). 


The  proof  of  correctness  is  directly  implied  by  Theorems  6.7.0,  6.7.11, 
and  6.7.12. 

□ 

The  outputs  of  the  algorithm  can  be  used  in  the  following  ways.  If  the 
output  is  “1”  (not  partitionable),  then  the  system  designer  will  know  that  the 
network  cannot  be  divided  into  individual  subnetworks.  If  the  output  is  “3” 
(strictly  (r-partitionable),  then  the  network  can  be  partitioned  and  the 
composition  of  the  component  networks  will  produce  a  set  of  correspondences 
identical  to  that  of  the  original  network.  Note  that  if  a  network  b  strictly  <7- 
partitionable  it  b  not  r-partitionable  nor  cr-partitionable.  If  the  output  b  “2”, 
the  network  b  r-partitionable.  Any  network  that  b  r-partitionable  b  also  tr- 
partitionable.  However,  if  a  network  b  r-partitionable  then  |r(B')|  = 

I  r(Bi)|  =  I  C| ,  1  <  i,  j  <  n,  which  b  not  true  in  general  for  a  (T-partitionable 
network.  Since  |r(B')|  =  |  r(&)|  =|C|,  l<i,  j  <  n,  the  number  of 
correspondences  in  each  component  network  <r(B')>  b  the  same  (|C| )  for  i, 
1  <  i  <  n.  This  property  means  that  the  same  control  decoders  can  be  used  in 
all  network  components  in  a  r-partitionable  network.  If  the  output  b  “4”  (<7- 
partitionable),  then  the  network  can  be  partitioned  and  the  composition  of  the 
component  networks  will  produce  a  set  of  correspondences  that  b  a  superset  of 
that  of  the  original  network. 

The  output  of  the  algorithm  applies  only  to  the  reconfigurable  part  of  the 
network  because  partition  ability  b  defined  in  terms  of  a  decomposition  into 
"reconfigurable”  network  components  (|r(B')|  >  1).  If  the  original  network 
had  some  B'  such  that  |  r(B')|  =  1,  then  those  constant  component(s)  should  be 


added  to  the  network  components)  generated  by  the  algorithm  in  order  to 
reproduce  the  original  network. 


There  are  different  types  of  partitbnability  than  the  discussed  here.  For 
example,  study  of  the  partitionability  of  networks  where  some  of  the  network 
correspondences  are  not  used,  e.g.,  as  can  be  done  with  the  cube  network  was 
discussed  in  [SieSO,  Sie85]. 


6.8  Conclusions 


In  this  chapter  the  interconnection  network  properties  of  composition, 
decomposition,  and  partitionability  were  analyzed.  The  general  model  of 
interconnection  networks,  defined  in  Chapter  4,  was  used  to  describe 
composition,  decomposition,  and  partitionability  properties  of  networks.  The  t- 
and  o-composition  and  the  r-  and  o-decomposition  discussed  here  are  of 
horizontal  type  and  they  are  described  in  detail  in  the  text. 

The  importance  of  the  partitionability  property  b  described  in  the 
introduction.  It  was  found  that  there  actually  are  many  different  types  of 
partitionability  and  the  type  that  uses  all  states  for  consideration  of 
partitionability,  was  discussed  in  detail  here.  This  type  of  partitionability 
consisting  of  three  subtypes,  is  analyzed  in  this  chapter.  The  three  subtypes 
the  r-partitionability,  (r-partitionability,  and  strict  <r-partitionability  were 
defined  and  analyzed.  An  algorithm  to  determine  whether  a  network  is 
partitionable  and  if  it  is  which  subtype  of  the  three  was  presented  and  proven 


7.1  Introduction 


I 


Id  this  chapter  the  problem  of  synthesis  of  single  stage  partitionable 
interconnection  networks  is  analyzed,  consequently  this  chapter  may  be  viewed 
as  an  application  section  of  the  chapter  on  analysis.  For  a  designer,  the 
analysis  allows  an  evaluation  of  networks  and  their  properties  [AdS82b,  Gok76, 
Law75,  McS82,  SeS85],  in  contrast  to  the  synthesis  which  provides  a 
construction  method  for  partitionable  networks.  The  body  of  this  chapter 
consists  of  two  major  parts,  each  of  which  containing  some  examples  to 
illuminate  the  issues. 

In  the  first  part,  an  example  of  a  single  stage  partitionable  network  will  be 
presented.  Then,  an  algorithm  to  generate  a  large  class  of  single  stage 
partitionable  networks  will  be  developed  and  proven  correct.  This  algorithm  b 
based  upon  the  results  presented  in  the  chapter  on  analysb.  For  ease  of 
presentation  the  discussion  will  presented  for  the  case  of  networks  with 
I  Vi|  =  I  VqI  and  with  two  network  components  only,  however  it  can  easily  be 
generalized  to  networks  where  |  V||  ^  |Vo|  and  to  networks  with  more  than 
two  components. 

The  second  part  of  the  body  of  thb  chapter  dbcusses  the  problem  of 
synthesb  of  a  special  case  of  partitionable  networks.  The  special  class  of 
networks  consbts  of  those  networks  that  are  isomorphic  to  a  direct  product  of 
groups  [Han88,  Her75].  Since  groups  have  been  studied  in  abstract  algebra 
extensively,  techniques  are  known  to  determine  the  possibility  of  d  .composition 
of  a  given  group  into  a  direct  product  of  groups.  Again,  for  ease  of 
presentation,  the  discussion  b  shown  for  the  direct  product  of  two  groups  only, 


but  can  be  generalized  to  a  product  of  multiple  groups. 


7.2  Overview 


In  Section  7.3  the  problem  is  defined.  In  Section  7.4  the  previous  work 
done  is  outlined.  In  Section  7.5  the  basic  concepts  are  presented.  In  Section 
7.6  some  examples  and  algorithms  to  synthesize  a  large  classes  of  single  stage 
partitionable  networks  are  presented.  In  addition,  a  special  case  of 
partitionable  interconnection  networks  that  are  isomorphic  to  a  direct  product 
of  groups  is  described.  In  Section  7.7  the  conclusions  for  thb  chapter  are 
presented. 


7.3  Problem  Statement 


In  this  chapter,  the  results  presented  in  the  chapter  on  analysis  are  used  to 
synthesize  partitionable  networks.  Based  upon  the  examples  and  the  work  in 
the  previous  chapter,  an  algorithm  is  developed  that  allows  the  synthesis  of  a 
large  class  of  partitionable  networks.  An  interesting,  special  class  of  networks 
which  is  isomorphic  to  a  direct  product  of  groups  is  analyzed.  Since  the 
problem  of  decomposition  of  groups  into  a  direct  product  of  groups  is  well 


! 

t 

I. 

) 

ft 

I 

I 


tr’: 


137 


known  in  abstract  algebra  [Han88,  Her75],  it  can  be  used  to  evaluate  the 
partitionability  of  these  networks. 


7.4  Previous  Work 


The  material  in  this  chapter,  the  synthesis  of  a  partitionable 
interconnection  networks,  is  directly  based  on  the  material  in  the  chapter  on 
the  analysis  of  partitionable  interconnection  networks.  The  synthesis 
procedure  is  based  on  the  chapter  on  analysis,  consequently  this  chapter  can  be 
viewed  as  an  application  section  of  the  material  discussed  there. 


7.6  Basic  Concepts 


In  this  section  the  basic  concepts  are  presented.  Some  definitions  can  be 
found  in  a  text  on  abstract  algebra  and  are  included  here  for  completeness  only 
[HanOS,  Her75]. 


'Tyyy. 


Definitton  7.5.1: 


Let  G  be  a  set  with  0  a  binary  operation  with  closure.  Let  $  be 
associative. 

(1)  3  1  G  G  3:  1  *  g  =  g  *  1  =  g,  \/g  6  G.  (1  is  the  identity 

element.) 

(2)  3  g"‘  6  G  3:  g  *  g~*  =  g"*  •  g  =  1,  \/  g  €  G.  (Each  element 

has  an  inverse.) 

Then  <G,  is  called  a  group. 


Definition  7.5.2: 


Let  CfVj  X  Vq]  be  a  C-set.  Let  V|  =  {u0,U|,...u„_i}  and  Vq  = 

{vo,v„...v„_,}. 

Define  a  binary  operation  7  on  C[V|  x  Vq]  as  follows: 

:  C[V,  X  Vol  X  C[V,  X  Vol  ->  e(V,  x  Vo], 
c,  7  Cb  ^  {<Ui,v,>  I  <Ui,Vj>  G  C„  <Uk,v,>  G  0^,  j=k}. 


7.tt  Synthesis  of  Single  Stage  Partitionable  Networks 


This  section  consists  of  two  major  parts.  In  the  first  part,  an  example  of  a 
single  stage  partitionable  network  will  be  presented.  Based  upon  the  example 
and  the  material  in  the  chapter  on  analysis  an  algorithm  will  be  developed 
which  allows  the  synthesis  of  a  large  class  of  partitionable  networks.  The 
discussion  will  be  restricted  to  the  case  of  |  Vi|  =  I  Vq|  for  the  ease  of 


presentation,  however  the  results  are  applicable  to  the  case  |  V||  |  Vq|  .  The 
construction  is  given  in  terms  of  constraints  on  the  structure  of  the  I/O 
correspondences  of  the  network.  In  the  second  part  an  interesting  special  case 
of  single  stage  partitionable  interconnection  networks  will  be  discussed.  It  will 
be  shown  that  this  class  is  isomorphic  to  a  class  of  groups.  For  that  special 
class  of  networks,  it  will  be  shown  that  a  network  is  strictly  (r-partitionable  if 
and  only  if  it  is  isomorphic  to  a  direct  product  of  two  groups.  Since  groups 
have  been  studied  extensively  in  abstract  algebra,  analytical  methods  are 
known  to  find  a  (possible)  decomposition  into  a  direct  product. 

Example  7.6.1: 

Let  K  €  KfVj  x  VqJ,  K  =  <C>  be  a  network.  Let 
|Vi|  =  |Vo|  =  m.  Denote  Vj  =  {uo,Ui,...u„j_,}  and  Vq  = 

{vo,Vi,...v„,.,}.  Let  r,  0  <  r  <  m-1.  Let  C  =  {Cp,  C,},  and  be 
addition  modulo  r.  Let  a,  b,  c,  d  be  arbitrary  integers  such  that 
a  #  c  mod  r  and  b  d  mod  m-r. 

Cp  =  {<Ui,Vj>  I  j  =  i  ©,  a,  0<i<r}  (J 

{<Ui,vj>  I  j  =  r+(i  b),  r<i<m}, 

C,  =  {<Uj,Vj>  I  j  =  i  ©rC,  0<i<r}  U 

{<u„Vj>  I  j  =  r+(i  ©„_,  d),  r<i<m}. 

Show  that  the  network  is  partitionable. 

Solution: 

(1):  Denote  V/  =  {uo,Ui,...Ur_,},  =  {vo,v,,...v^,}  and  V,^  = 

{UpU,+„...u„_,},  V3  =  {vpV,+„...v„_,}.  Intuitively  the 

network  can  be  partitioned  into  one  network  over  V/  x  and 
second  network  over  Vi  x  V^. 


(2) :  Although  the  components  of  the  underlying  graph  of  K,  and 

could  be  finer  than  described  below,  it  b  guaranteed  that  there 
are  at  least  two  components  B*  and  B^  as  follows.  Cp,  C,  ^ 
there  are  at  least  two  components  B*  and  B^. 

B*  =  {<Ui,  Vj>  I  j  =  i  ©,  a,  0<i<r}  U 

{<Ui,  Vj>  I  j  =  i  ©,  c,  0<i<r}. 

B*  =  {<Ui,  vj>  I  J  =  r+(i  b),  r<i<m-r}  U 

{<Ui,  vj>  I  j  =  r+(i  ©„_,  d),  r<i<m-r}. 

(3) :  Let  K(B‘)  €  K[V,‘  x  Vq*),  K(B')  =  <C‘>.  C*  =  {Cp»,  C,‘}, 

Cp‘  =  p(Cp,B*)  =  {<Ui,  Vj>  I  j  =  i  ©r  a,  0<i<r}, 

Cq  =  P(C,,B‘)  =  {<Ui,  Vj>  I  j  =  i  ®r  c.  0<i<r}.  Let 

K(B2)  €  KIV,2  X  Vo^l,  K(B2)  =  <C2>.  C*  =  (C^  Cj}, 

Cp  =  p(Cp,B2)  =  {<0i,  Vj>  I  j  =  r+(i  ©„_,  b),  r<i<m}, 

C,^  =  p(C,,B*)  =  {<Ui,  Vj>  I  j  =  r+(i  ©„_,  d),  r<i<m}. 

(4) :  Then  K(B*)  <7  K(B^)  Dc  K,  therefore  K  is  a-partitionable. 

The  example  can  be  generalized  into  the  following  algorithm  to  generate  a 
large  class  of  partition  able  interconnection  networks. 

Algorithm  7.6.2: 

Let  K  G  K(V|  x  Vq],  K  =  <C>  be  a  network. 

(1)  Let  V,  =  {uo,u„...u„_,}  and  V©  =  {vo,v„...v„_,}. 

(2)  Let  Cj,C2,...C„  G  C,  and  r,  0<r<m-l. 

Cl  =  {<Ui,Vj>  I  j  =  f,(i)  mod  r,  0<i<r}  U 

{<Ui,Vj>  I  j  =  r+(g|(i)  mod  (m-r)),  r<i<m}, 

Cj  =  {<Ui,Vj>  I  j  =  fjO)  mod  r,  0<i<r}  U 

{<u:,v:>  I  j  =  r+(g2(i)  mod  (m-r)),  r<i<m},  •  •  • 


Cn  =  {<«i.Vj>  I  j  =  Ui)  mod  r,  0<i<r}  U 

{<Ui,Vj>  I  j  =  r+(g„(i)  mod  (m-r)),  r<i<in}. 

The  functions  g|^(i)  &re  arbitrary  integer  functions  such  that: 
If  0<x,  y<r  then  (fij(x)  mod  r)  =  (f|((y)  mod  r)  iflfx=y.  If 
r<x,  y<m  then  (g|t(x)  mod  (m-r))  =  (g|t(y)  mod  (m-r)) 

iff  X  =y.  The  above  is  necessary  to  insure  that  the  constructed 


correspondences  are  nondestructive. 


Theorem  7.6. S: 


Every  network  constructed  using  the  Algorithm  7.6.2  is  a  tr-partitionable 
network. 


Proof: 


Similar  to  solution  of  Example  7.6.1. 


The  following  algorithm  will  generate  a  large  class  of  r-way  partitionable 
networks,  where  r  is  the  number  of  components.  It  is  easier  to  use  than  the 
Algorithm  7.6.2  and  the  class  is  smaller  than  the  one  generated  by  Algorithm 
7.6.2.  In  addition  the  component  networks  of  the  partitionable  networks 
generated  by  Algorithm  7.6.4  are  isomorphic  to  each  other. 

Algorithm  7.6.4’ 

Let  r  be  any  integer,  let  N  =  r"*.  Construct  a  network  K  over 
V|  X  Vq,  K  =  <C>  as  follows. 

(1)  Let  V,  =  {Pro-iPm-2  •  Po  I  Pi=0,l,...i^l},  and 
Vo  =  {qm-iqm-2 "  •  qo  I  qi=o,i,...r-i}. 


(2)  Construct  Cj  €  C  as  follows: 

Cj  =  {<Pm-iPm-a  •  •  Po»  Pm-i»j(Pm-2  *  ’  *  Po)».  ^*»ere  »j  is 
any  permutation  of  letters  Pg^_2  *  * '  Po* 

Theorem  7.6.5: 

Every  network  constructed  using  Algorithm  7.6.4  b  a  <r-partitionable 
network. 

Proof: 

Intuitively,  there  will  be  r  subnetworks,  where  the  kth  subnetwork  has 
input  labeb  of  form  kp,„-2  *  *  *  PiPo  and  output  labeb  of  the  form 
kqai-2  '  ‘  '  <li<)o-  Using  similar  steps  as  in  Example  7.6.1,  it  can  be  shown 
that  the  component  networks  are  {K*'  =  <C''>  j  K*'  €  KIV/'  x  V^], 
k=0,l,...r-l}  with  =  {kp„_2  •  •  •  Po),  and  =  {kq,„-2  •  •  Qo), 
k=0,l,...r-l. 

□ 

Example  7.6.6: 

Let  r  =  2,  let  m  =  4,  N  =  16.  Construct  a  network 

K  €  K[V|  X  Vq),  K  =  <C>  as  follows. 

Let  V|  =  {P3P2P1P0  I  Pi=0,l},  and 

Vq  =  I  Uet  C  =  {C0,C|,C2). 

Co  =  {<PsP2PlPo.  P3P2PoPl>|  Pi  =  0.1  }. 

C|  =  {<P8PaPlPo.  P3PoP2Pl>|  Pi  =  0,1  }, 

Ca  =  {<P3P2PlPo.  P3PlPoP2>|  Pi  =0,1  }. 

Show  that  the  network  is  partition  able. 


SomUon: 


Although  there  may  be  more  then  two  components,  it  is  guaranteed  that 
there  are  at  least  two  component  networks.  The  kth  subnetwork  has 
input  labeb  of  form  kp„j_2  '  *  '  PiPo  output  labels  of  the  form 
^^Qin-2 '  *  ^  =0,1.  The  component  networks  are  {K*'  = 

<€>'>  1  K*'  G  KIV,*^  X  V^l,  k=0,l}. 

Now  consider  a  special  network  K  G  K[Vj  x  Vq],  K  =  <C>  such  that 
C,  <7  is  a  group.  It  is  possible  to  view  the  correspondence  =  {<Pi,  qj>  j 
Pi  €  V|,  qj  G  Vq}  as  the  permutation  jr^  =  {<i,  j>  |  i,  j  G  A}.  For  example 
let  Cfc  =  {<Pi,  qj>  I  Pi  G  V|,  qj  G  Vq,  j  =  i  ©  k  mod  m}  be  a 

correspondence,  the  induced  permutation  is  = 

{<i,  j>  I  j  =  i©k  mod  m,  i,  j  G  A}.  The  partitionability  of  this  class  of 
networks  is  related  to  the  direct  product  composition  of  groups  in  abstract 
algebra  as  will  be  shown  by  the  following  theorems. 

Theorem  7.6.7: 

Let  K  G  K[V|  x  Vq],  K  =  <C>  be  a  strictly  (T-partitionable  network. 
Let  C,  7  be  a  group,  where  7  is  the  composition  of  maps  (see  definition 
7.5.2).  Then  C,  7  is  isomorphic  to  a  direct  product  of  two  groups. 

Proof: 

(1):  As  stated  previously,  it  is  assumed  for  simplicity  of  presentation 

that  the  network  has  only  two  components,  call  them 
K(B‘)  =  <C‘>  and  K(B2)  =  <C2>.  Let 

C*  =  {p(Ci,B‘)  I  Ci  G  C}  and  C*  =  (piCj.B^)  |  Cj  G  C}  be  the 
two  sets  of  correspondences.  It  will  be  shown  that  <C*,  7> 
and  <C*,  7>  are  groups  and  their  direct  product  is  isomorphic 


(2) :  Show  <C^  7>  u  ft  group. 

(2.1) :  Show  <C^  7>  has  closure:  Show  3  C|(  €  C  3c 

p(Ci,  B‘)  7  p(Cj,B*)  =  p(Ck,  B*). 

(2.1.1) :  Let  C^=Ci'jCy  -♦  p(Ck.  B*)  =  p(Ci7Cj,B*)  = 

{<U„  Vb>  I  <U.,V^>  €  Cj,  <Ue,Vk>  €  Cj,  <u.,Vb>  e  B*}. 

(2.1.2) :  <u.,Vj>  e  Cj,  <u^,Vb>  €  B*  -♦  <.u.,Ve>  6  B*  -♦ 

<u*,v,>  e  p(Cj,  B*). 

(2.1.3) :  <u„Vfc>  6  Cj,  <u.,v,,>  €  B*  €  B‘ 

<Ue,Vb>  6  p(Cj,  B*). 

(2.1.4) ;  p(Ck,  B‘)=  {<«»,Vb>|  <u.,Ve>6Ci,  <Ue,Vb>  6  Cj, 

€  B*}  =  {<Ua,  Vb>l  <u.,Ve>  e  p(Ci,  B*), 

<u„Vb>  €  p(Cj,  B*)}  =  p(Ci,  B»)  7  p(Cj,B*). 

(2.2) :  Show  7  is  associative: 

By  definition  of  the  operation. 

(2.3) :  Let  Cj  €  C,  Cj  7  Cj  =  Cj  7  Cj  =  Cj  then  p(Cj,B*)  is  the 

identity  in 

(2.4) :  Show  C*  contains  inverses.  Let  C;  7  Cj  =  then 

p(Ci,B')  ^  p(Ci,B‘)  =  p(C..B'). 

(3) :  Similarly  can  show  <C*,  7>  b  a  group. 

(4) :  Construct  direct  product  group  <C*  x  C^,  <9  >. 

(4.1):  Define  ® : 

<p(Ci,B'),  p(Cj,B2)>  ®  <p(Ck,B‘),  p(C„B*)>  = 

<p(Ci,  B>)  7  p(Ck,  B*),  p(Cj,  B*)  7  P(C,,  B*)>. 


From  algebra  it  is  known  that  the  direct  product  of  groups  is  a 
group. 

(5):  Define  ft  C  -  C*  x  C*.  fl(Ci)  =  <p(Ci,  B‘),  p(Ci,  B*)>. 

(5.1) :  Show  0  is  group  homomorphism. 

Show:  e(Ci 't  Cj)  =  fl(Ci)  ® 

(5.1.1) :  tf(Ci  T  Cj)  =  <KCk)  3:  Ck  =  C.  t  Cj. 

(5.1.2) :  H(\)=  <p(Ck,  B').  p(Ck,  B®)>. 

(5.1.3) :  (2)  and  (3)  -  pCq,  B')  1  iKCpB')  =  p(Ck,  B').  and 

p(C„  B^)  T  p(Ci,B»)  =  p(Cj,  B’) 

-»  <p(Ck,  B'),  p(Ck,  B*)> 

=  <p(Ci,  B')  7  p(Ci,B').  P(Ci,  B*)  ^  p(Cj,B’)> 

=  <p(Ci,  B>),  p(Ci,  B*)>  ®  <p(Cj,  B‘),  p(Ci,  B*)> 

=  ^Cj)  ®  ^Cj).  Therefore  b  a  group  homomorphbm. 

(5.2) :  <p(Ci,  B*),  p(Cj,  B*)>  €  C*  x  C*  and  K  b  strictly  ct- 

partitionable  3  Ck  3t  p(Ci,  B*)  =  p(Ck,  B*)  and 

p(Cj,  B*)  =  p(Ck,  B*)  therefore  0  b  onto. 

(5.3) :  Show  $  b  1:1. 

Kernel  of  $  =  {Cj},  where  b  identity  of 

<C,  'i>  9  ]s  1:1. 


(6):  (5)  <C,  '/>  bomorphic  to  <C^  x  C*.  ®  >. 


Theorem  7.6.8: 


Let  <C,  7>  be  a  group  that  b  isomorphic  to  a  direct  product  of  two 
groups  <C*,  'i>  and  <C*,  'i>.  Then  K  =  <C>  b  a  strictly  <r- 
partitionable  network. 

Proof: 

(1) :  Let  <C,  7>  be  the  group.  Let  <C,  7>  w 

<C,  7>  0  <C,  7>  where  0  b  the  direct  product  and  the  sets 
C*  and  are  as  in  the  proof  of  Theorem  7.6.7. 

(2) :  Then  the  rest  of  the  proof  consbts  of  reversing  the  steps  of  proof 

of  Theorem  7.6.7. 

□ 

The  next  two  examples  show  an  application  of  Theorem  7.6.8. 

Example  7.6.9: 

Let  K  G  K[Vi  x  Vq],  K  =  <C>  be  a  network.  Let  V|  = 
{uo.u,,U2,U3,U4,U5}  and  Vq  = 

Let  C  =  {C„C2,C3,C4}, 

Cj  =  {<Ui,Vi>  I  i=0,l,...5}, 

^2  =  {<«o.Vi>.  <Ui,Vo>,  <Ui,V5>  I  i=2,3, -S}, 

C3  =  {<Uo.VO>.  <Ul.Vi>,  <U2,V3>,  <U3,V2>, 

<U4,V5>,  <Us,V4>}, 

C4  =  {<Uo,V,>,  <U,,Vo>,  <U2,V3>,  <U3,V2>, 

<U4.V5>,  <U5,V4>}. 

Show  that  the  network  b  strictly  9-partitionable. 


Sotution: 


(1) :  Show  that  C,  7  is  a  group. 

From  the  multiplication  table,  Table  7.1,  it  can  be  seen  that 
<C,  7>  is  a  group. 

(2) :  Consider  <C*,7>.  Let  C*  =  {C/.Cj},  C/  = 

{<Uo,Vo>,  <Ui,V|>},  and  Cj  =  {<Uo,v,>,  <u,,Vo>}.  Then 
<C*,  7>  is  a  group. 

(3) :  Consider  Let  C^  =  {Cf.Cl},  C?  = 

{<Ui,Vj>  I  i=2,3,...5},  and  C|  =  {<U2,V3>,  <U3,V2>, 

<“4.'^5>.  <U5.V4>}-  Then  <C^  7>  is  a  group. 

(4) ;  Let  the  direct  product  group  be  <C*  x  C*,  ®  >  where 

<Ci‘,Cj2>  ®  <Ck'.C,*>  =  <Ci‘7Ck»,  Cf7C,2>. 

(5) :  Given  :  C—  C*  x  C*  a  map,  ^Cj)  =  <p(Ci,B‘),  p(Ci,B2)>, 

and  C,  =  C,‘  UCf,  C2  =  C2*  UC?,  C3  =  C/  UC2*  C4  =  [jCl 
then  it  b  easy  to  show  that  9  is  homomorphism,  onto  and  1:1. 

(6) :  Consequently  K  is  a  strictly  (r-partitionable  network. 

Example  7.6.10: 

Let  K  €  K[Vi  x  Vq],  K  =  <C>  be  a  network.  Let  V|  = 

{uo,u„U2,U3,U4}  and  Vq  =  {vo,v,,V2,V3,v4}. 

Let  C  =  {C|,C2,C3,C4,  CsjCj},  Cf  =  {<uj,vj>  |  i=0,l,...4}, 

C2  =  {<Uo,Vo>,  <U,,V2>,  <U2,V3>,  <U3,V|>,  <U4,V4>}, 

C3  =  {<Uo.Vo>.  <U|,V3>,  <U2,Vi>,  <U3,V2>,  <U4,V4>}, 

C4  =  {<Uo,V4>,  <U4,Vo>,  <Ui,vj>  I  i=l,2,3}, 

C5  =  {<Uo.V4>,  <Ul,V2>,  <U2,V3>,  <U3,V,>,  <U4,Vo>}, 


C,  =  {<Uo,V4>,  <U„V3>,  <U2,V,>,  <U3,V2>,  <U4,Vo>}. 

Show  that  the  network  is  strictly  (r-partitionable. 

Solution: 

(1) :  Show  that  <C,  7>  is  a  group. 

Constructing  multiplication  table,  Table  7.2  <C,  7>  is  a  group. 

(2) :  Consider  <C*,7>.  Let  C*  =  {C/,Cj},  C/  = 

{<uo.vo>.  <«4.V4>}.  and  Cj*  =  {<Uo,V4>,  <U4,v,,>}.  Then 
<C*,  7>  b  a  group. 

(3) :  Consider  <C*,7>-  Let  C^  =  {Cj^Cj^.Cl),  Cf  = 

{<Ui,Vi>  I  i  =  l,2,3},  C|  =  {<U,,V2>,  <U2,V3>,  <U3,Vi>}, 

C3  =  {<Ui,V3>,  <U2,v,>,  <U3,V2>}.  Then  <C^  7>  b  a 

group. 

(4) :  Let  the  direct  product  group  be  <C*  x  C*  (8>  >  where 

<Ci‘,Cf>  ®  <ClC^>  =  <Ci‘7C,*,  Cf7C,2>. 

(5) :  Same  as  Example  7.6.0.  9  b  homomorphism,  onto  and  1:1. 

(6) :  Consequently  K  b  a  strictly  cr-partitionable  network. 


7.7  Conciuaiona 


In  thb  chapter  the  problem  of  synthesb  of  single  stage  partitionable 
networks  was  studied.  Thb  chapter  can  also  be  viewed  as  an  application 


section  of  the  chapter  on  analysis.  This  chapter  contains  two  algorithms  and  a 
theorem  describing  the  construction  of  (r-partitionable  networks.  The  first 
algorithm  is  the  most  general  and  produces  a  large  class  of  <r-partitionable 
networks.  The  second  algorithm  is  easier  to  use  but  it  generates  a  smaller  class 
of  (7-partitionable  networks.  The  theorem  describes  the  existence  of  a  class  of 
strictly  (r-partitionable  networks  and  it  can  be  used  in  bidirectional  sense,  that 
is  (a)  can  be  used  to  decide  whether  a  certain  class  of  networks  is  strictly  a- 
partitionable  and  (b)  can  be  used  to  construct  a  class  of  strictly  (r-partitionable 
networks.  The  theorem  says  the  following.  Let  K  =  <C>  be  a  network.  If 
C,  'y  is  a  group  and  K  is  strictly  or-partitionable,  then  C,  7  must  be  isomorphic 
to  a  direct  product  of  groups.  The  problem  of  decomposing  groups  into  direct 
products  has  been  studied  extensively  in  the  group  theory  so  the  results  derived 
in  abstract  algebra  can  be  directly  applied  here. 


8.1  Introdoetlon 


In  this  chapter  the  analysis  of  multistage  networks  will  be  addressed 
[AdS82b,  Bat74,  Ben74,  BoD72,  FenSl,  McS82].  This  extends  the  work  done 
on  single  stage  networks  in  previous  chapters  into  the  domain  of  multistage 
interconnection  networks.  Although  parts  of  the  work  done  on  single  stage 
networks  are  transferable  to  the  domain  of  multbtage  networks,  the  concepts 
are  more  complicated. 

The  material  in  this  chapter  will  be  presented  as  follows.  A  vertical 
composition  of  networks  will  be  defined  and  its  properties  shown.  Using 
vertical  composition  and  the  model  of  single  stage  networks,  multistage 
networks  will  be  defined.  By  using  the  single  stage  model,  which  was  analyzed 
earlier,  as  a  building  block  for  multistage  networks,  some  results  from  the 
study  of  single  stage  networks  can  be  applied  to  multistage  networks.  The 
multistage  network  model  is  very  general  since  each  stage  can  be  a  completely 
general  single  stage  network.  This  model  differs  from  some  of  the  previous 
models  of  regular  multistage  networks  by  being  completely  general  and 
therefore  applicable  to  all  multistage  networks.  Several  examples  of 
applications  of  the  model  are  discussed. 


154 


J 

i 


8.2  Overview 

The  organization  of  the  chapter  b  as  follows.  In  Section  8.3  the  problem 
will  be  informally  defined.  In  Section  8.4  tbe  previous  work  will  be  reviewed. 
In  Section  8.5  the  basic  definitions  such  as  vertical  composition  of  networks  will 
be  presented  and  its  properties  analyzed.  In  Section  f.8  the  formal  definition  of 
multistage  network  is  developed  and  several  examples  of  applications  of  the 
model  are  given.  The  chapter  is  summarized  in  Section  8.7. 


8.3  Problem  Statement 


In  this  chapter  a  formal  definition  of  multistage  networks  will  be 
developed.  First  a  vertical  composition  of  single  stage  networks  is  defined. 
Some  properties  of  the  composition  are  exhibited.  Then  multistage  networks 
are  defined  by  the  vertical  composition  of  single  stage  networks  where  Vq  of 
the  ith  network  is  equal  to  the  V|  of  the  i  +  lst  network.  The  multistage 
network  model  is  very  general  since  each  stage  consists  of  a  completely  general 
single  stage  network. 


Definition  8.5.1: 


Let  Cp*  €  C[V,»  X  Vo*)  and  Cj  €  C[V,*  x  Vo*).  Let  V,^  =  V,*. 
Cp*  If  Cj  ^  I  <u„Wj>  e  Cp*,  <Wj,Vb>  e  c*}.  Then  7  w 

called  a  7  compoeifton  of  I/O  eorreepondeneee. 

The  Theorems,  Lemmas  and  Definitions  8.5.2  to  8.5.11  discuss  the 
properties  of  the  7  composition  of  I/O  correspondences  which  will  be  used  to 
define  the  yf'map.  The  0-mnp  is  the  network  composition  used  to  construct 
multistage  networks  and  its  properties  are  determined  by  the  properties  of  the 
7  composition  of  I/O  correspondences. 

The  following  theorem  shows  that  the  7  composition  of  two  nondestructive 
I/O  correspondences  is  a  nondestructive  I/O  correspondence. 

Theorem  8.5.2: 

Let  Cp*  €  CfV,*  X  Vo*)  and  C*  6  C[V|*  x  Vq*).  Let  V^  =  V^. 
ThenCp»7C*  €  CIV,*  x  Vg). 

Proof: 

(1) :  Clearly  Cp*  7  C*  G  PIV,*  x  Vg),  that  b  Cp*  7  C*  is  an  I/O 

correspondence  over  V,*  x  V^. 

(2) :  Show  Cp*  7  C*  €  C[V,*  x  Vg),  that  is  Cp*  7  C*  is  a 

nondestructive  I/O  correspondence  over  V,*  x  V^. 

Assume  Cp*  7  C*  0  C[V,*  x  V^) 

^  <u„Wb>,  <Ue,Wb>  G  Cp*  7  C*,  that  is  C^  7  C*  is  not  a 

nondestructive  I/O  correspondence  over  V,*  x  V^. 

(2.1):  Case  1:  <u„v,>,  <u,,v,>  G  Cp*  and  <v„Wb>  G  C* 

contradiction  since  Cp*  G  C[V,*  x  V©*),  (that  is  <u»,v,>, 


V. 


V, 


vlvS'-I'* 


157 


<Uj,Vj,>  €  Cp  Cp  is  not  nondestructive). 

(2.2):  Case  2:  <U4,v,>,  <tte,Vy>  €  Cp*  and 
<v„wi,>,  <Vy,Wb>  G  Cj 

-♦  contradiction  since  C*  €  C[V|*  x  Vq*],  (that  is,  <Vj„W|,>, 
<Vy.Wb>  G  C*  ^  C*  b  not  nondestructive). 

□ 

The  following  theorem  shows  that  the  7  composition  of  I/O  correspondences  b 
associative. 

Theorem  8.5.S: 

Let  Cp*  G  C[V,*  X  Vo*l.  Cj  G  CfV,*  x  Vq*],  and  C»  G  C[V,»  x  V^). 

Let  =  V,2  and  Vg  =  Vf*. 

Then  Cp*  7  (Cj  7  C ®)  =  (Cp*  7  C*)  7  C». 

Proof: 

(1):  Show  Cp*  7  (Cj  7  C’)  C  (Cp*  7  Cj)  7  C». 

(1.1) :  <u.,Xh>  G  Cp‘7(C2  7C®) 

<u.,v^>  €  Cp*,  <v^,Xb>  G  C*  7  C’ 

"♦  <Vc»Wd>  G  C|,  <Wd,Xb>  G  C®. 

(1.2) :  <u.,v,>  G  Cp*,  <v„Wd>  G  C* 

“♦  <u„Wd>  G  Cp*7C*. 

(1.3) :  <u„W4>  G  Cp*  7  C|,  <wj,x,;>  G  C* 

-♦  <u„Xb>  G  (Cp‘7C®)  7C® 

-♦  Cp*  7  (C 2  7  C®)  C  (Cp*  7  Cj)  7  c». 


158 


(2) :  Similarly  can  show  Cj  'j  (C*  'y  C*)  D  {C^  7  Cj)  7  C*. 

(3) :  (l)and(2)  -4  7  (C*  7C»)  =  (C;7C2)  7C*. 

□ 

The  following  theorem  shows  that  the  7  composition  of  I/O  correspondences  is 
not  commutative. 

Theorem  8.5.4-’ 

Let  C,‘  e  C(V,'  X  Vo'l  Md  C|  £  CIV,*  x  Vo*|.  Let  Vi  =  V?  end 
Vi  =  V|*.  Thea  Cp  t  C*  ^  -t  Cp  io  generel. 

Proof: 

For  example,  if  Cp^  =  {<u»,w,,>}  and  Cj  =  {<w,„v^>},  then  Cp*  7 
G}  =  {<u.,v,>},andCj  7  Cp*  =  0. 

□ 

Definition  8.5.5: 

Let  c;  e  CIV,'  X  Vo'l  “d  C*  =  {C*  \  q=l,2....n>. 

C|  €  C[V,»  X  Vo*|.  Let  Vi  =  Vj*. 

Cp'-rc*  A  {Cp'7C*|  C,*ec*). 

The  following  theorem  says  that  the  7  composition  of  a  nondestructive 
correspondence  with  a  set  of  nondestructive  correspondences  produces  a  set  of 
nondestructive  correspondences. 


Theorem  8.5.6: 


Let  C*  €  C(V,»  X  Vo»l  and  C*  =  {C*  |  q=l,2,..n}, 

Cj  e  C[V,2  X  Vo*].  Let  Vi  =  Vf.  Then  Cp*  7  C*  C  C[V,*  x  Vi). 

Proof: 

From  Theorem- 8.5.2,  V  C *  €C*  ,  7  Cj  G  C[V,*  x  Vg!  -4 

Cp*'yC2  C  C[V,»  X  Vil. 

□ 


The  following  theorem  shows  that  the  7  composition  of  I/O  correspondence 
with  a  set  of  I/O  correspondences  is  not  commutative. 

Theorem  8.5.7: 

Let  Cp*  6  C[V,*  X  Vo‘l  and  C*  =  {Cj  |  q=l,2,...n}, 
C*  €  C(V,*  X  Vo*J.  Let  Vi  =  Vf  and  Vi  =  V,‘. 

Then  Cp  Tf  C*  ^  C*  7  Cp*  in  general. 

Proof: 

Similar  to  the  proof  of  Theorem  8.5.4. 


□ 


Definition  8.5.8: 

Let  C*  =  {Cp*  I  p  =  l,2,...m},  Cp*  G  C[V,*  x  Vq*]  and 

C*  =  {C*  I  q=l,2,...n},  C|  G  C(V,2  x  Vq^.  Let  Vi  =  Vf*. 

C*  'T  C*  ^  {Cp*  'I  Cj  I  Cp*  GC*,  C*  GC*}.  Then  '1  is  called  a  7 
composition  of  sets  of  I/O  correspondences. 

The  following  theorem  says  that  the  7  composition  of  two  a  sets  of  I/O 
correspondences  produces  a  set  of  I/O  correspondences. 


lOU 


Theorem  8.5.0: 


Proof: 


Let  C*  =  {C;  1  p=l.2,...m},  €  C[V,>  x  Vq*! 

C*  =  {CJ  I  q=l,2,...n}.  Cj  €  C[V,*  x  Vq*).  Let  =  Vf. 
Then  C*  7  C*  C  C[V,*  x  Vgj. 

Theorem  8.5.2  Cp  7  C|  6  C(V|*  x  V§1 
-►  C*  7  C®  C  C[V,‘  X  Vg]. 


The  following  theorem  shows  that  the  7  composition  of  sets  of  I/O 
correspondences  is  associative. 

Theorem  8.5.10: 

Let  C‘  =  {Cp*  I  p=l,2,...m},  Cp*  6  C[Vi>  x  Vq*), 

C*  =  {C|  I  q=l,2,...n},  e  CfV,*  x  Vo%  and 

C»  =  {C»  I  r=l,2,...o},  C»  €  C[V,»  x  Vg).  Let  Vg  =  V,2  and 
Vg  =  Vg. 

Then  C‘  7  (C*  7  C’)  =  (C*  7  C*)  7  C». 


Proof: 


Theorem  8.5.3  Cp‘7(C*  7  C ’)  =  (Cp>7Cp2)  7C* 

C*7(C*  7C’)  =  (C*7C*)  7C*. 


The  following  theorem  shows  that  the  7  composition  of  sets  of  I/O 
correspondences  is  not  commutative. 


Theorem  8.5.11: 


Let  C*  =  {Cp»  I  p=l,2,...m},  €  C[V,*  x  Vq*)  and 

C*  =  {C2  I  q=l,2,...n},  C*  6  C[V,2  x  Vq*).  Let  and 

Va  =  V,*. 

Then  0*7  0*  /  C*  7  C*  in  general. 


Proof: 

Apply  Theorem  8.5.7. 


□ 


In  the  following  part  the  )9-map  will  be  defined  and  its  properties  studied. 
The  0-ma,p  is  used  to  define  multistage  networks  and  its  properties  are  based 
on  the  properties  of  the  7-composition  of  I/O  correspondences  discussed  earlier. 

Definition  8.5.12: 

Let  €  K[V,‘  x  Vo‘l,  K*  =  <C‘>  and  K*  £  K[V,2  x  Vq^], 
K*  =  <C2>.  Let  va  =  Vi*. 

Define  0-m&p  as  follows:  K*/?K*=  <C*>  p  <C*>  = 

<{Cp‘  7  Cj  I  Cp‘  £C\  Cl  €C2}>. 

The  p-m&p  describes  the  composition  of  networks  where  all  outputs  of  the  first 
network  K*  are  connected  into  (all)  inputs  of  the  second  network  (Vq  =  V|*). 
This  is  referred  to  as  vertices  composition  of  networks.  This  situation  arises  in 
the  construction  of  multistage  and  is  motivated  by  existing  multistage 
networks  such  as  ADM,  Cube  or  STARAN  network,  where  each  stage  may  be 
considered  a  network. 

The  following  theorem  shows  that  the  P  composition  of  two  networks 
results  in  a  network  over  V/  x  V^. 


Theorem  8. 5. IS: 


Let  K*  €  KIV,>  X  Vo‘),  K*  =  <C*>  and  K*  6  KjV,*  x  Vo*], 
K*  =  <C*>.  Let  =  V,* 

Let  K»/?K*=  <C‘>  0  <C*>  =  <{C;7C*|  Cp*  6C*, 

C*  €C*}>.  Then  <{Cp‘  7 C*  |  eC\  C*  6C*}>  e  K(Vi»  x  V^]. 

Proof: 

(1) :  Theorem  8.5.g  ^  C*  7  C*  C  C[V,'  x  Vgj. 

(2) :  Show  s(C*  7  C*)  =  V,*. 

V  €  V/  3  V,  €  and  Cp*  €  C  3:  <u^,v,>  e  C}, 

Vv,  €  3  Wb  €  VS  and  C*  6  C*  3c  <v„Wb>  G  Cj 

Vu.  6  Vi»  3  Cp*  G  C‘  and  C*  G  C*  3:  <u»,Wb>  G 
Cp'^C*  -•  s(C‘7C*)  =V,». 

(3) :  Show  d(C‘  7  C*)  =  Vg, 

V  w.  G  Vg  3  V,  G  V,*  and  C *  G  C*  3  <v,,w,>  G  C*, 

V  V,  G  V,*  3  Ub  G  V,*  and  Cp*  G  C  3  G  Cp‘ 

Vw.  G  3  Cp*  G  C*  and  C*  G  C*  3  <Ub,w.>  G 
Cp‘7C*  d(C*7C2)=V3. 

(4) :  Show  I C*  7  C*|  >  2. 

(4.1) :  |C*|>2  -►  3  Cp*,  C,*  G  C*,  Cp*  ^  C,* 

3  <u^,Vb>  G  Cp*,  <u^,Vb>  ^  C,*. 

(4.2) :  Vb  G  ^  Vb  G  V,2  -►  3  C*  G  C*  <Vb,w,>  G  C*. 

C*  G  C[V,2  X  Vo*j  -4  <v„w,>  St  C*.  V,  ^  Vb. 

(4.3) :  (4.1),  (4.2)  <u.,w,>  G  Cp*  7  C*  <u.,w,>  ^  C,*  7  C* 

Cp*7C*  /  C,*7C*  -♦  |C*7C*|  >2. 


The  analysis  of  the  partitionability  of  multistage  networks  will  necessitate 
the  analysis  of  a  network  with  a  stage  fixed  at  a  given  correspondence. 
Consequently,  the  fixed  stage  no  longer  qualifies  as  a  reconfigurable  network  as 
originally  defined.  Therefore  one  cannot  use  the  y^map  to  describe  the  vertical 
composition  of  the  reconfigurable  stages  and  the  fixed  stage  of  the  network. 
To  handle  the  problem,  one  could  either  define  a  new  map,  or  use  the  /?-map 
with  the  understanding  that  the  fixed  stage  is  not  a  reconfigurable  network. 
The  latter  approach  will  be  used  here. 

The  following  corollary  shows  that  the  0  composition  of  a  network  and  a 
fixed  network  stage  results  in  a  network  over  Vj*  x  V^. 

Corollary  8.5.14: 

Let  K'  e  K(V,'  X  Vo‘1,  K‘  =  <C‘>  and  Cj  6  C[V,2  x  Vq^, 
s(C,*)  =  Vf,  d(C»)  =  Vl  Let  =  Vj*. 

Then  K>^CJ=  <C‘>  ^  Cj  =  <{C;tC|1  6C')> 

€  K(V,'  X  V3|. 

Proof: 

Similar  to  proof  of  Theorems  8.5.6  and  8.5.13. 


164 


8.6  Multistage  Network  Model  and  Applications 

In  this  section  multistage  networks  will  be  formally  defined.  The 
definition  is  based  upon  the  examples  of  widely  known  multistage  networks 
such  as  ADM,  Cube,  STARAN,  and  others. 

Definition  8.6.1: 

Let  €  K[V/  x  Vq],  K  =  <C'>,  r  =  0,l,...t-l  be  a  set  of 
interconnection  networks.  Let  Vq  =  r  =  0,l,...t-2.  Then  K  = 
K®  K*  ^  is  a  multistage  network  over  V®  x  Vq"*.  Note  that 

K  can  also  be  represented  as  K  =  <C>,  C  =  C®  7  C*  7  •  •  •  C‘“‘. 

Intuitively  K'  describes  the  rth  stage  of  the  multistage  network. 

In  this  part  some  applications  of  the  model  of  multistage  networks  will  be 
presented.  Although  parts  of  the  research  done  on  single  stage  networks  are 
transferable  to  the  domain  of  the  multbtage  networks,  the  concepts  are  more 
complicated.  Examples  of  some  artificially  constructed  networks  will  be  given 
in  details.  The  examples  of  the  networks  are  constructed  in  such  way  as  to 
illuminate  the  different  types  of  partitionability  of  multistage  networks. 
Informally  partitionability  in  multistage  networks  is  achieved  by  selecting 
specific  controk  in  some  stages  and  letting  all  other  stages  dynamically  select 
their  correspondences.  Hereafter,  the  former  stages  will  be  referred  to  as  fixed 
and  the  latter  as  free  stages.  Although  the  material  is  presented  for  the  case  of 
partitionability  into  two  component  networks,  it  is  easily  generalized  into  r  > 
2  component  networks. 


Example  8.6.2: 

Consider  the  following  multbtage  network  (Figure  8.1.) 

The  network  has  the  following  functionality. 

(1)  There  are  two  stages  denoted  by  G®,  G*. 

(2)  There  are  two  switching  elements  E-^,  i=0,l  in  each  stage  r. 

(3)  Each  switching  element  Ef,  i=0,l  has  the  following  functionality. 
Vi  =  {a,b},  Vq  =  {c,d},  C  =  {Co,C,},  Cq  =  {<a,c>,  <b,d>}, 
Cj  =  {<a,d>,  <b,c>}.  (This  is  the  same  as  the  straight  and 
exchange  settings,  respectively,  of  a  multistage  Cube  type 
network  (LawTSj.) 

It  can  be  shown  that  if  in  stage  G®,  in  E®,  i=0,l  the  Cq  is  selected,  then 
the  network  can  be  partitioned  into 

K®  6  K({uo,U2}  X  {W(,,W2})  and  K*  E  Kl{u,,U3>  x  {wj.wj}). 

Example  8.6.S: 

Consider  the  following  multistage  network  (Figure  8.2.) 

The  network  has  the  following  functionality. 

(1)  There  are  two  stages  denoted  by  G®,  G^. 

(2)  There  is  one  switching  element  E'  in  each  stage  r. 

(3)  Each  switching  element  has  the  following  functionality. 
V,  =  {a,b,c,d},  Vo  =  {e,f,g,h},  C  =  {Co,C,}, 

Co  =  {<a,e>,  <b,f>,  <c,g>,  <d,h>}, 

C,  =  {<a,f>,  <b,e>,  <c,h>,  <d,g>}. 

It  can  be  shown  that  if  in  stage  G®,  the  Cq  is  selected,  then  the  network 
can  be  partitioned  into 


168 


K®  €  K({uo,U2}  X  {wo,W2}J  and  K*  €  KKuj.Uj}  x  {W|,w,}]. 

Example  8.6.4: 

Consider  the  following  multistage  network  (Figure  8.3.) 

The  network  has  the  following  functionality. 

(1)  There  are  two  stages  denoted  by  G®,  G*. 

(2)  There  are  two  switching  elements  in  E®,  E®  in  stage  G®  and  one 
switching  element  Ej  in  stage  G^ 

(3)  Each  switching  element  in  stage  G®  has  the  following 
functionality.  V|  =  {a,b},  Vq  =  {c,d},  C  =  {Co,C|},  Cq  = 
{<a,c>,  <b,d>},  C,  =  {<a,d>,  <b,c>}. 

(4)  The  switching  element  in  stage  G*  has  the  following  functionality. 
V,  =  {a,b.c,d}.  Vo  =  {e,f,g,h}.  C  =  {Co,C,}, 

Co  =  {<a,e>,  <b,f>,  <c,g>,  <d,h>}, 

Cl  =  {<a,f>,  <b,e>,  <c,g>,  <d,h>}. 

It  can  be  shown  that  if  in  stage  G\  in  the  switching  element  Eq  the  Cq 
is  selected,  then  the  network  can  be  partitioned  into  K®  G 
K[{uo,Ui}  X  {wo,w,}J  and  K*  G  K({u2,U3}  x  {wj.wj}). 

Example  8.6.5: 

Consider  the  following  multistage  network  (Figure  8.4.) 

(1)  There  are  two  stages  denoted  by  G®,  G*. 

(2)  There  is  one  switching  element  E®  in  stage  G®  and  two  switching 
elements  E/  in  stage  G*. 

(3)  The  switching  element  E®  has  the  following  functionality. 
Vi  =  {a,b},  Vo  =  {c,d},  C  =  {Co,C,},  C®  =  {<a,c>,  <b,d>}, 


i 

1 


C,  =  {<M>,  <b,c>}. 


(4)  The  switching  elements  Eq,  E/  have  the  following  functionality. 
V,  =  {a},  Vo  =  {b,c},  C  =  {Co,C,},  Co  =  {<a,b>},  C*  = 
{<a,c>}. 

It  can  be  shown  that  if  in  stage  G**,  the  Cq  is  selected,  then  the  network 
can  be  partitioned  into  K®  6  K[{uo}  x  {wo,w,}]  and  K*  G 
X  Instead  of  fixing  the  setting,  consider  the  setting 

of  the  switches  in  stage  G^  If  in  stage  G*  in  switching  elements  Eq*,  E/ 
either  the  Cq  or  C|  is  selected,  then  the  network  is  not  partitionable  due 
to  the  following.  If  Cq  is  selected  then  W|  and  W3  are  not  accessible  and 
if  C|  is  selected  then  Wq  and  W2  are  not  accessible  in  any  state. 

To  summarize  the  information  from  the  examples,  the  following  is 
essential  for  a  multistage  network  to  be  partitionable.  The  network  must  have 
more  than  one  stage.  There  must  be  at  least  one  stage  such  that  if  one  state  in 
that  stage  is  selected,  two  data  path  independent  (and  possibly  control 
independent)  networks  are  generated.  The  two  subnetworks  must  have  Vf,  Vi 
and  Vq,  Vq  such  that  V®  UV|*  =  V|  and  Vq  UVq  =  Vq.  It  is  not  essential 
that  the  controls  of  the  generated  subnetworks  are  independent  of  each  other. 
It  is  not  essential  that  each  subnetwork  will  be  again  partitionable  although  it 
is  an  interesting  subclass. 


0.1  Introduction 


This  b  a  study  of  a  network  design  to  support  interprocessor  data 
communications  in  a  proposed  real-time  dbtributed,  digital  signal  processing 
system  structure  |SeS84c].  The  system  b  general  nature  in  that  it  may  be  used 
in  a  wide  variety  of  signal  processing  applications,  such  as  Finite  Impulse 
Response  (FIR)  filters,  FFT  and  beamforming.  Fault  tolerance  b  a  significant 
design  issue  for  this  system.  In  particular,  the  ability  to  reallocate  dbtributed 
processing  resources  with  minimal  human  intervention  b  important  in  order  to 
maintain  a  functioning  system,  although  possibly  somewhat  degraded  in 
performance.  The  overall  design  of  the  system  reflects  its  fault  tolerance  and 
generality. 

Figure  0.1  shows  the  signal  data  transfer  parameters  for  three  iterations  of 
the  evolutionary  dbtributed  signal  processing  system.  These  parameters  are 
based  on  expectations  for  thb  type  of  system  and  are  used  as  guidelines  for  the 
design  work  in  thb  study.  Three  phases,  A,  B,  and  C,  are  indicated  in  the 
figure.  Phase  A  b  the  1985  time-frame.  Phase  B  is  the  1090  time-frame,  and 
Phase  C  b  the  1005  time-frame.  Seven  functional  sets  of  devices  are  shown: 
preprocessor,  signal  conditioner,  signal  processor,  general  purpose  processor, 
tape  storage,  dbk  storage,  and  operator  console. 

The  top  three  rows  of  parameters  in  the  figure  indicate  the  number  of  PEa 
(processing  elements)  in  each  functional  set  for  each  phase  of  development 
(note  that  for  tape  and  dbk  storage  the  *'PEs”  refers  to  storage  devices).  The 
notation  ‘‘x('9y)’’  means  ‘‘x"  devices,  each  with  "y”  times  the  capability  of  the 
similar  device  used  in  the  previous  phase  (as  a  result  of  technology  insertion). 


Number  of 
Proeening 
Elements  (PEs) 


PhsM  A  8 

Phnse  B  l« 

Phase  C  48(016) 


8 

6 

18 

2 

6 

16 

12 

24 

4 

6(02) 

48(02) 

80(02) 

36(02) 

8(08) 

12(02) 

Signal 

Signal 

General 

Tape 

Disk 

Conditioner  ' 

'  Procesnr 

Purpose 

Processor 

Storage 

Storage 

Operntor 


. {• 


Phase  A  2  16 

Phase  B  4  32 

Phase  C  32  192 


2  2  2 

4  3  3 

32  4  4 


Total  Bandwidth 
per  Functional  Group 
(m^bytes/sec.) 


Figure  9.1; 

Signal  data  transfer  parameters  for  three  iterations 
of  an  evolutionary  dbtributed  processing  system. 


•  •  -  •  •  *  - 


‘  V  V  V 

w  ^  ■  -  w  *  W  ,  m  , 


iOgiS 


U4Ca* 


176 


There  are  two  classes  of  data  communication  paths.  The  solid  lines  are 
used  for  data  processing.  The  dotted  lines  are  used  for  the  preprocessor  to 
send  the  input  data  signals  to:  (1)  tape  storage  to  create  history  files  for 
possible  later  off-line  processing,  and  (2)  the  operator  console  for  monitoring 
purposes. 

The  parameters  in  rows  four  through  six  indicate  the  expected  average 
“total  bandwidth  per  functional  set”  for  each  of  the  three  phases.  The 
numbers  correspond  to  the  connection  above  them  represented  by  an  arrow  in 
the  figure.  The  average  bandwidth  per  PE  in  a  functional  set  b  calculated  by 
dividing  the  total  average  bandwidth  by  the  number  of  PEs.  For  example,  the 
preprocessor  to  signal  conditioner  bandwidth  is  2  Mbytes/sec  per  PE  for  both 
Phases  A  and  B.  The  peak  bandwidth  is  approximated  by  two  times  the 
average  bandwidth.  For  example,  the  preprocessor  to  signal  conditioner 
bandwidth  is  4  Mbytes/sec  per  PE  for  both  Phases  A  and  B. 

Our  study  of  the  data  transfer  network  for  thb  system  will  focus  on  the 
preprocessor  to  signal  conditioner  to  signal  processor  communications.  The 
distance  between  the  preprocessor  and  signal  conditioner,  as  well  as  between 
the  signal  conditioner  and  signal  processor,  is  expected  to  be  on  the  order  of 
five  feet.  These  functional  sets  will  most  likely  share  a  single  cabinet.  The 
entire  system  will  most  likely  fit  in  a  rectangular  area  of  approximately  40  feet 
by  60  feet. 

The  following  are  assumptions  used  in  later  sections  about  the  expected 
data  communications  between  the  preprocessor  and  signal  conditioner,  and 
between  the  signal  conditioner  and  signal  processor.  Note  that  for  this  study 
the  network  b  not  required  to  provide  communications  among  the  PEs  within  a 
functional  set. 


177 


The  communication  patterns  between  the  PEs  in  adjacent  functional  sets 
is  predetermined  before  execution  begins.  Thus,  each  PE  in  a  functional  set 
knows  to  which  PE(s)  to  send  data  in  the  next  functional  set.  In  case  of  faults 
in  the  system,  once  the  fault  is  detected  the  system  control  program  reallocates 
tasks  to  the  PEs  and  modifies  the  associated  connection  requirements.  Each 
relevant  PE's  program  is  updated,  appropriate  program  rollback  and  restart 
procedures  are  performed,  and  execution  continues. 

Communications  between  functional  sets  will  be  from  a  fixed  group  of  four 
PEs  in  the  sending  set  to  a  fixed  group  of  four  PEs  in  the  receiving  set.  This  is 
demonstrated  in  Figure  9.2.  The  communication  between  a  group  of  four 
sending  PEs  and  a  group  of  four  receiving  PEs  can  be  one-to>one,  many-to-one, 
or  one-to-many.  The  one-to-one  implies  each  sending  PE  is  connected  to  only 
one  receiving  PE  (and  so  each  receiving  PE  is  connected  to  only  one  sending 
PE).  Note  that  the  pairing  of  a  sending  PE  to  a  receiving  PE  b  arbitrary. 
Thb  one-to-one  pairing  is  expected  to  be  the  predominant  mode  of  operation. 
The  many-to-one  connection  implies  that  more  than  one  PE  in  the  sending 
group  transmits  data  to  the  same  PE  in  the  receiving  group,  i.e.,  two-to-one, 
three-to-one,  or  four-to-one.  This  mode  would  be  used  in  case  there  are  one  or 
more  faulty  PEs  in  the  receiving  group,  or  if  the  computational  task  being  done 
by  the  sending  PEs  required  the  use  of  multiple  PEs  in  order  to  prepare  data 
for  a  single  PE  in  the  next  functional  set.  This  mode  is  expected  to  occur 
infrequently.  The  one-to-many  connection  implies  that  one  PE  in  the  sending 
group  transmits  data  to  multiple  PEs  in  the  receiving  group,  i.e.,  one-to-two, 
one-to- three,  or  one-to-four.  This  mode  would  be  used  in  case  there  are  one  or 
more  faulty  PEs  in  the  sending  group,  or  if  the  computational  task  being  done 
by  the  receiving  PEs  required  the  use  of  multiple  PEs  in  order  to  process  data 


Preprocessors 


Signal  Conditioners 


Signal  Processors 


Figure  9.2: 

A  signal  data  switching  configuration  for  front  end  PEs. 


from  a  single  sending  PE.  This  mode  is  also  expected  to  occur  infrequently. 
The  modes  can  be  combined,  e.g.,  a  two-to-one  connection,  a  one-to-two 
connection,  and  a  one*tO‘One  connection  (between  one  pair  of  PEs)  can  be 
established  simultaneously  if  no  PEs  are  faulty.  To  summarize  the  connection 
patterns  between  functional  groups:  the  most  common  mode  of  operation 
expected  is  the  one-to-one  pattern  among  four  sending  PEs  and  four  receiving 
PEs  (arbitrarily  paired),  but  the  network  should  also  be  capable  of  efficiently 
supporting  one-to-many  and  many-to-one  connections,  as  well  as  combinations 
of  all  three  patterns. 

Data  transfer  between  PEs  in  different  functional  sets  will  be  overlapped 
with  computation.  Consider  the  example  shown  in  Figure  0.3,  where  each  PE 
is  connected  to  one  PE  in  the  next  functional  group.  Shown  below  each  PE  is 
its  three  bank  swinging  buffer  memory:  one  bank  for  data  currently  being 
operated  upon,  one  bank  for  storing  data  previously  generated  by  that  PE  (and 
currently  being  sent  to  the  next  PE),  and  one  bank  to  receive  data  currently 
being  sent  by  the  previous  PE  (for  processing  after  the  current  data  set  has 
been  processed)  (Dem83].  Each  bank  is  a  physically  separate  memory  of  64K 
words.  Thus,  each  PE  is  effectively  sending  a  data  set,  processing  a  data  set, 
and  receiving  a  data  set  simultaneously.  For  example,  consider  the  data  sets  in 
the  figure  using  the  signal  conditioner  PE’s  swinging  buffers.  Data  set  E  is 
being  sent  by  the  preprocessor  PE  (which  previously  generated  it)  to  the  signal 
conditioner  (which  will  process  it  after  it  finishes  processing  data  set  D).  Data 
set  D  is  currently  being  processed  by  the  signal  conditioner  PE.  Data  set  C  is 
being  sent  from  the  signal  conditioner  PE  (which  previously  generated  it)  to 
the  signal  processor  PE  (which  will  process  it  after  it  finishes  processing  data 
set  B).  The  transmission  of  data  sets  A,  C,  E,  and  G,  and  the  processing  of 


180 


Figure  9.3: 

PEs  and  their  associated  swinging  buffered  memories. 


181 


data  sets  F,  D,  and  B,  are  all  occurring  simultaneously.  The  time  to  perform 
these  simultaneous  transmissions  and  computations  is  called  an  interval.  In  the 
next  interval,  data  sets  B,  D,  and  F  will  be  transmitted,  and  data  sets  A,  C, 
and  G  will  be  processed.  Similarly,  in  the  interval  prior  to  the  one  shown  in 
the  figure,  data  sets  A,  C,  and  E  were  processed,  and  data  sets  B,  D,  and  F 
were  transmitted.  In  summary,  data  sending,  receiving,  and  processing  occurs 
simultaneously  for  the  PEs,  as  shown  in  Figure  0.3,  through  the  use  of  three- 
way  swinging  buflTers,  and  an  interval  is  the  time  required  for  a  PE  to 
simultaneously  receive  a  data  set,  transmit  a  data  set,  and  process  a  data  set 
(such  as  the  signal  conditioner  PE  does  with  data  sets  E,  C,  and  D, 
respectively,  in  the  figure.)  It  is  assumed  that,  in  general,  the  time  to  process  a 
data  set  is  longer  than  the  time  to  transmit  or  receive  a  data  set,  and  therefore 
determines  the  length  of  the  interval. 

The  amount  of  data  sent  by  a  single  PE  is  expected  to  be  a  block  of  a 
minimum  of  IK  words  and  a  maximum  of  64K  words.  A  number  of  source 
PEs  can  send  data  to  any  of  the  destination  PEs  (as  specified  by  the 
connectivity);  each  destination  PE,  however,  receives  data  from  at  most  one 
source  at  any  given  time.  Multiple  sources  send  data  to  a  common  destination 
in  a  multiplexed  fashion  (in  a  predetermined  static  way)  so  that  each  source 
can  send  its  data  without  contention. 

In  summary,  the  data  communications  will  be  between  the  “swinging 
memory  buffers”  associated  with  the  PEs  in'  the  system.  The  requirements  are 
that  communications  will  be  among  groups  of  four  source  PEs  and  four 
destination  PEs.  Four  approaches  were  considered:  multistage  based  networks, 
ring  based  networks,  shared  bus,  and  crossbar  based  networks.  As  a  result  of 
the  analyses  presented  in  [SiM84],  crossbar  based  networks  were  selected  as  the 


2/2 


no-MS?  22S 
UNCLASSIFIED 


DISTRIBUTED  COHPUTINQ  FOR  SIGNAL  PROCESSING; 

TOPOLOGICAL  PROPERTIES  OF  IN. .  <U>  PURDUE  UNIV  LAFAYETTE 
IN  R  R  SEBAN  DEC  89  ARO-1879B.  17-EL-BPP-E 
DAAG29-82-K-B1B1  F/Q  972  NL 


82 


j 

< 

method  of  choice  for  data  communications  in  the  system.  Multbtage 
(log2N  stage)  networks,  such  as  the  cube  [AdS82b,  SiM81],  were  not  chosen 
because  to  establish  connections  between  just  four  (or  eight)  source  PEs  and 
four  (or  eight)  destination  PEs  the  crossbar  is  more  flexible,  and,  given  current 
technology,  cost-effective.  Ring  based  networks  were  eliminated  because  if 
most  communications  are  1:1  (one  source  PE  to  one  destination  PE)  and  occur 
simultaneously,  the  parallel  paths  provided  by  the  crossbar  make  it  more 
suitable.  Shared  bus  networks  were  eliminated  because  the  PE’s  swinging 
buffers  could  not  load  data  onto  them  fast  enough  for  them  to  operate  at  the 
desired  Phase  B  bandwidth.  In  this  work,  the  characteristics  of  a  crossbar 
chip,  the  organization  of  these  chips  for  fault  tolerance,  and  the  way  in  which 
the  crossbar  based  network  can  be  interfaced  to  the  processors  are  described. 

An  8-by-8  design  is  proposed  instead  of  a  4-by-4  design  to  provide  extra  load 
balancing  capabilities  when  faults  occur. 


0.2  Overview 


In  Section  0.3  the  problem  is  informally  defined.  In  Section  0.4  the  basic 
terms  are  defined.  In  Section  0.5  the  buffer  to  network  interface  is  dbcussed. 
Several  architectures  at  the  chip  level  are  analyzed  in  Section  0.6.  Section  0.7 
evaluates  four  network  architectures.  Finally,  in  Section  0.8,  various  fault 
detection  and  recovery  methods  (at  the  system  level)  are  presented.  In  Section 
0.0  the  conclusions  are  presented. 


83 


0.S  Problem  Definition 


The  network  design  presented  is  based  on  the  preprocessor/signal 
conditioner  communication  requirements.  These  requirements  are  greater  than 
those  for  the  signal  conditioner/signal  processor,  both  in  terms  of  throughput 
and  number  of  processors.  However,  they  are  similar  enough  that  it  appears 
best  to  use  the  same  network  design  in  both  cases.  Since  the 
preprocessor /signal  conditioner  requirements  are  stricter,  these  will  be  used  to 
guide  the  network  design.  Any  capabilities  included  for  the  preprocessor /signal 
conditioner  communications  but  not  needed  for  the  signal  conditioner/signal 
processor  communications  can  be  adapted  to  provide  additional  fault  tolerance. 

The  problem  is  to  design  an  interconnection  network  for  a  distributed 
signal  processing  system  satisfying  the  following  specifications.  The 
specifications  here  are  based  on  the  expectations  of  the  way  in  which  such  a 
system  may  operate. 

In  this  section,  the  requirement  of  network  extendibility  to  a  larger 
number  of  PEs  is  described.  The  data  communication  is  between  a  set  of 
source  processors  and  a  set  of  destinations  processors.  The  set  corresponds  to  a 
common  functional  specifications,  such  as  the  set  of  processors  used  as 
preprocessors.  The  data  movement  is  unidirectional  from  a  source  processor  to 
a  subset  of  destinations  processors.  The  processors  are  addressed  by  distinct 
consecutive  integers  in  each  set  separately.  The  communication  requirements 
specify  that  a  fixed  group  of  four  source  PEs  in  one  set  be  allowed  to  send  data 
to  a  fixed  group  of  four  destination  PEs  in  another  set.  The  subset  of  four 
processors  addressed  by  a,  where 


will  be  called  group  i. 

In  Phase  A  (1085)  the  system  will  have  two  groups  in  the  source  set  and 
two  groups  in  the  destination  set.  In  Phase  B  (1090)  the  system  will  have  four 
groups  in  the  source  set  and  four  groups  in  the  destination  set.  Phase  C  (1005) 
of  the  system  will  not  be  discussed  here  since  we  feel  that  technology  will  have 
changed  so  much  by  then  that  it  is  better  to  concentrate  our  efforts  in  this 
section  on  phases  A  and  B.  It  is  desirable  for  a  single  conceptual  design  to  be 
applicable  to  both  phases  A  and  B. 

In  this  paragraph  the  throughput  requirements  of  the  network  are 
specified. 

For  phase  A: 

(a)  Average:  2  Mbyte/sec/PE  with  8  PEs  active 

(b)  Peak:  4  Mbyte/sec/PE  with  8  PEs  active 

For  phase  B: 

(a)  Average:  2  Mbyte/sec/PE  with  16  PEs  active 

(b)  Peak:  4  Mbyte/sec/PE  with  16  PEs  active 

Since  the  physical  distance  between  the  set  of  sources  and  the  set  of 
destinations  is  expected  to  be  on  the  order  of  five  feet,  it  will  be  assumed  that 
throughput  of  a  single  wire  is  1  Mbyte/sec/wire.  Using  a  crossbar  based 
network,  a  4-bit  network  word  is  sufficient  to  handle  the  peak  load  of  each  PE 
(4  Mbyte/sec)  for  both  Phases  A  and  B. 

The  following  are  the  interconnection  function  requirements.  The  required 
data  communication  is  only  from  group  i  of  a  set  of  source  PEs  to  group  i  of  a 


set  of  destination  PEs  (four  source  PEs  and  four  destination  PEs).  The 
functions  are  as  follows  within  each  group.  PE  k  can  send  data  to  any  subset 
of  the  destination  group.  Any  number  of  source  PEs  can  send  data  to  any  of 
the  destination  PEs  as  long  as  each  destination  PE  is  getting  data  from  at  most 
one  source  at  the  same  time.  (Note  that  when  multiple  sources  send  data  to 
the  same  destination  time  division  multiplexing  is  used  so  that  each  source  can 
send  its  data  without  contention.)  The  amount  of  data  sent  by  a  single 
processor  is  expected  to  be  a  minimum  IK  words  and  a  maximum  of  64K 
words. 

The  fault  detection  and  recovery  is  a  salient  issue  of  the  design.  Soft 
faults  are  transient  and  temporary.  An  important  requirement  is  that  soft 
faults  occurring  in  control  messages  (e.g.,  message  header,  chip  control  signak) 
will  be  detected.  It  is  not  as  important  to  protect  data  information  from  soft 
errors,  as  they  can  normally  be  treated  as  additive  noise.  If  desired,  parity  bits 
or  erro^detection/error-cor^ections  bits  could  be  added.  Hard  faults  are 
permanent.  Therefore,  hard  faults  occurring  in  control  and  data 
communications  must  be  detected  as  early  as  possible.  In  summary,  the  system 
should  be  able  to  recover  from  as  many  soft  and  hard  errors  as  possible, 
perhaps  with  some  loss  functionality  or  throughput. 

Another  important  requirement  is  that  the  cost  of  the  implementation  will 
be  low.  The  cost  categories  are:  (a)  number  of  chips;  (b)  number  of  distinct 
types  of  chips;  and  (c)  wiring  complexity  between  the  chips  of  the  network. 


9.4  BmIc  Concepts 


In  this  section  some  terminology  that  is  used  throughout  this  work  b 
presented. 

Tratumission  dialog:  the  action  of  a  processor  transmitting  all  the  data 
contents  of  its  buffer  to  perhaps  multiple  destinations.  A  transmission  dialog 
consbts  of  a  number  of  transmission  blocks. 

Tratumission  block  an  uninterrupted  transmbsion  of  (at  128  to  IK  bytes),  a 
component  of  a  transmission  dialog. 

Data  interconnection  network,  the  hardware  dedicated  to  the  transmission  of 
data  from  sources  to  destinations. 

Report  interconnection  network,  the  hardware  dedicated  to  the  transmbsion  of 
status  and  error  reports  from  destinations  to  sources. 

PE:  processing  element  or  processor. 

DMA:  direct  memory  access  hardware  -  the  hardware  that  controb  the  state 
of  the  network  and  b  responsible  for  the  details  of  the  transmbsion  dialog. 
Source  PE:  processor  designated  so  by  being  the  source  of  data  transmitted 
through  the  data  network. 

Deatination  PE:  processor  designated  so  by  being  the  destination  of  the  data 
transmitted  through  the  data  network. 

SDMA:  the  DMA  interfacing  the  source  PE. to  the  input  of  the  data  network. 
DDMA:  the  DMA  interfacing  the  output  of  the  data  network  to  the 
destination  PE. 

Network  port,  the  input  (output)  pins  of  the  network  dedicated  to  a  single 


processor. 


Network  piUk  the  reconfigurable  hardware  between  the  source  DMA  of  a  single 
processor  to  the  destination  DMA  of  a  single  processor. 

Network  bit  path:  a  single  one-bit  wide  component  of  the  network  path. 

Network  word:  the  word  consbting  of  the  functioning  bit  paths  (per  network 
path). 

Syetem  word:  16-bit  word  also  called  “word.” 

Input  wire:  is  a  wire  connection  from  output  of  the  source  DMA  to  the  I/O  pin 
at  the  input  of  a  chip  of  the  network. 

Output  wire:  is  a  wire  connection  from  the  I/O  pin  at  the  output  of  a  chip  of 
the  network  to  the  input  of  the  destination  DMA. 

Middle  wire:  is  a  wire  connection  between  the  chips  of  the  network. 

Chip  data  line:  is  the  path  that  data  uses  inside  a  network  chip. 

Chip  control,  is  the  path  and  logic  the  control  uses  inside  a  network  chip. 


0.6  DMA  -  Direct  Memory  Aceem 

This  section  describes  the  specialized  DMA  chips  or  logic  needed  to 
interface  the  the  swinging  buffers  to  the  network. 

Here  the  source  DMA  functions  (see  Fibres  0.4  and  0.5)  will  be  described. 
(1)  Buffer  interface:  The  logic  that  interfaces  to  the  swinging  output  buffer. 

(S)  Data  formatter:  The  conversion  of  16  bit  words  into  the  network  word. 
Network  word  width  is  determined  by  the  number  of  nonfaulty  bit  paths  per 
network  port. 


Tosmoi 

MOMTOR 


c-cotmioi 

R  •  RBrarr  utats.  errori 
AS  -  rr  snuRcr  ADfwm 
M>  •  re  DW1WATMIN /UWRGn 


Figure  9.4: 

The  architecture  of  the  communicatiou  system:  D  •  data,  C  •  control, 
R  •  report,  AS  •  PE  source  address,  AD  -  PE  destination  address. 


DATA 


BUFFER 

1-FACE 


PE 


CTR 


i 


BLOCK 

COUNTER 


ADDRESS 


SYSTEM 

MONITO^ 


PE 

STATUS 


HEADER 

GEN 

IB 

1 

HEADER 

ENCODER 

■ 

STATUS 


DA 

FORMi 

TA 

ATTER 

DATA  NET. 
I-FACE 

i 

DATA 


ADDRESS 


DATA 

NETWORK 


REPORT  NETWORK 


Figure  9.5: 

Source  DMA  architecture. 


100 


(S)  Block  counter:  The  block  counter  b  loaded  by  the  source  PE.  It  is 
normally  initialized  to  the  number  of  transmission  blocks  per  dialog,  and 
decremented  each  time  a  block  is  transferred.  For  diagnostic  purposes  the  PE 
has  read/write  capabilities. 

(4)  PE  statue:  The  hardware  contains  a  PE  status  table.  The  table  consists  of 
K  registers.  Register  i  contains  the  status  of  the  destination  processor  i, 
0  <  i  <  K  (where  K  is  the  number  of  processors  in  the  group  (see  Section 
0.3)).  This  table  is  used  as  follows.  Every  time  a  destination  PE  receives  a 
block  of  data,  it  will  return  a  status  report.  The  system  monitor  can  read  the 
status  table  and  monitor  the  correctness  of  the  operations. 

(5)  Header  generation:  The  header  generation  logic  will  construct  a  header  as  a 
triple  (i,j,k),  where  i  is  the  logical  source  address,  j  is  the  logical  destination  PE 
address,  and  k  is  the  number  of  remaining  blocks  in  the  current  transmission 
dialog.  The  fault  tolerant  extension  of  this  minimal  header  is  discussed  in 
Section  0.8. 

(6)  Header  encoder:  The  header  encoder  logic  will  encode  the  header  using 
some  error  correction  code  (e.g.,  CRC)  to  protect  against  soft  errors. 

(7)  Diagnostics:  The  diagnostic  logic  is  used  for  diagnosis  of  the  network.  The 
SDMA  periodically  will  try  to  test  the  network  and  all  the  destination 
processors  for  faults.  The  diagnostic  logic  will  also  report  to  the  system 
monitor  any  destination  PE  that  does  not  function  properly. 

(8)  Parity  generator:  The  parity  generator  will  generate  parity  bits  for  each 
network  word. 


(9)  Data  network  interfaces  The  data  network  interface  logic  will  send  the 
network  word  (with  optional  parity)  to  the  network  input  port. 

Here  the  destination  DMA  functions  (see  Figures  9.4  and  0.6)  will  be 
discussed. 

(1)  Data  network  interface:  This  logic  accepts  data  from  the  network  output 
port. 

(2)  Parity  cheek:  This  logic  checks  for  correct  parity  of  the  network  word. 

(S)  Data  deformatter:  This  logic  converts  the  data  format  from  the  network 
word  format  to  the  16-bit  word  format. 

(5)  Header  decoder:  This  logic  decodes  the  header  which  was  encoded  by  the 
error  correction  code  at  the  source  PE. 

(6)  Header  cheek:  This  logic  will  check  the  source,  destination,  and  block  count 
fields  for  inconsistencies  (this  is  discussed  further  in  the  Section  9.8). 

(7)  Soft/ hard  error:  This  logic  will  make  the  determination  whether  a  soft  or 
hard  error  occurred  in  the  network.  It  will  do  so  by  counting  parity  errors  and 
using  information  about  header  errors.  If  it  b  a  hard  error,  the  DDMA  will 
notify  the  SDMA  which  will  then  reconfigure  the  bus  or  run  some  diagnostics 
to  identify  the  exact  error.  The  system  monitor  will  be  notified. 

(9)  Buffer  interface:  This  logic  sends  the  data  which  b  now  in  the  16-bit  word 
format  to  the  destination  bufTer  (see  Figure  9.4.) 


92 


Figure  9.6: 

Destination  DMA  architecture. 


103 


0.0  Architecture  of  the  Fault  Tolerant  Crossbar 

In  this  section  two  architectures  of  a  fault  tolerant  crossbar  will  be 
presented,  type  I  and  type  II.  There  are  many  ways  to  partition  s  nxmxk  (n- 
input,  m-output,  k-bit  wide)  crossbar  network  into  chips  subject  to  available 
I/O  pins  and  other  constraints.  One  way  to  partition  the  network  b  to  use  bit 
ilieing.  Here  the  desired  network  b  implemented  using  a  number  of  network 
planet.  Each  plane  would  have  the  same  number  of  interconnection  ports  but 
would  have  a  smaller  bit  path.  For  example,  a  4x4x8  crossbar  can  be 
implemented  using  thb  approach  with  four  4x4x2  crossbar  chips  as  illustrated 
in  Figure  0.7.  A  second  approach  b  to  build  the  larger  network  with  a  set  of 
smaller  networks.  Here  the  desired  network  b  obtained  by  essentially 
eaeeading  a  set  of  subnetworks.  An  example  of  how  a  4x4x8  crossbar  can  be 
implemented  using  this  approach  with  four  2x2x8  crossbar  chips  b  illustrated 
in  Figure  0.8. 

Here  the  partition  selected  b  based  upon  the  important  reliability  criteria. 
The  type  I  and  type  n  chips  are  implemented  as  bit  slices  since  that  minimizes 
the  number  of  chip-to-chip  connections  compared  to  the  cascading  approach. 
These  connections  slow  down  the  signal  and  more  importantly  force  the  bit 
path  through  many  soldering  joints  (an  unreliable  element). 

Different  chip  architectures  for  nxm  crossbars  are  discussed  in  [MaM81a, 
McT82|.  Our  design  differs  from  these  in  order  to  support  the  communication 
requirements  of  thb  particular  application.  The  differences  include  our  lack  of 
a  collision  detection  mechanism  (due  to  the  deterministic  nature  of  the 
communications),  our  addition  of  fault  tolerance,  and  our  use  of  serially  loaded 


190 

control  bits  (since  the  overhead  is  negligible  compared  to  the  length  of  the 
transmission  block). 

In  this  section  the  architecture  of  the  4x4^^^  type  I  crossbar  crossbar  chip 
is  described.  The  chip  must  satisfy  the  interconnection  requirements  described 
in  Section  0.3.  Also,  it  must  be  highly  fault  tolerant;  for  example: 

(a)  A  faulty  section  can  be  localized  and  disconnected  from  the  rest  of  the 
properly  functioning  chip. 

(b)  The  pins  available  allow  different  methods  of  controlling  the  chip.  It  is  up 
to  the  logic  designer  to  decide  which  method  satisffes  any  specific  set  of 
requirements. 

(c)  There  are  two  paths  for  all  data  lines  on  the  substrate. 

In  this  system  the  interconnection  functions  are  restricted  to  functions 
from  a  single  group  i  of  four  source  PEs  to  a  single  group  i  of  four  destination 
PEs,  thus  the  pin  limitation  based  design  methodology  discussed  in  [FrWSl, 
FrW82]  is  not  relevant  since  it  applies  to  networks  of  size  64x64  or  larger. 
Also,  because  here  the  concern  is  with  4x4  crossbars,  the  finite  state  automata 
type  implementations  as  discussed  in  [WaF83]  will  not  be  applicable,  especially 


since  the  fault  tolerance  of  the  implementation  is  the  most  important  aspect  of 
the  design. 

Figure  9.0  shows  a  block  diagram  of  a, type  I  chip.  The  pin  functionality 
is  aa  follows: 

(1)  4x4  is  described  here  for  pedagogical  reasons,  however  the  design  is  applicable  to  rxr 
crossbars  as  well. 


108 


I  DI  j:  Data  input  for  input  port  j 

I 

CTI  j:  Control  register  input  for  input  port  j 

I 

I  CTK  j:  Control  clock  input  for  input  port  j 

i 

DO  j:  Data  output  for  output  port  j 

RES:  Reset  input,  will  reset  all  CTRGs  (control  registers)  to  zero 

Power  supply,  two  physically  distinct  pins  (not  shown  on  hgure) 

G;  Ground,  two  physically  distinct  pins  (not  shown  on  figure) 

The  number  of  functional  pins  in  a  type  I  chip  is  as  follows;  input  port  i  : 
three  pins  (DI  i,  CTI  i,  and  CTK  i);  Output  port  j  :  one  pin  (DO  j);  and  reset  : 
one  pin  (RES).  For  a  4x4x1  crossbar  the  total  number  of  signal  (control  and 
data)  lines,  which  does  not  include  RES,  is  4x4  =  16  (4xN  for  an  NxNxl 
crossbar).  For  an  8x8x1  crossbar  the  total  number  of  signal  lines  is  8x4  =  32. 
Assuming  there  can  be  up  to  80  signal  pins  on  a  VLSI  chip  using  VHSIC 
technology,  four  4x4x1  crossbars,  each  with  its  own  control  and  reset  for  fault 
tolerance  purposes,  can  be  implemented  on  a  single  chip,  yielding  a  4x4x4 
crossbar.  Similarly,  an  8xSx2  crossbar  chip  could  be  constructed. 

There  are  several  methods  of  controlling  the  port. 

(a)  The  processor  that  sends  the  data  to  an  input  port  can  be  the  same 
one  that  sets  up  the  controls  for  that  port  (better  from  a  reliability 
point  of  view). 

(b)  The  chip  control  is  given  to  the  system  control  unit. 

We  will  assume  the  processor  sending  the  data  controls  the  input  port  (i.e.,  sets 
the  port’s  CTRG). 


190 


The  port  functionality  of  a  type  I  chip  is  as  follows.  The  CTRG  j  must  be 
loaded  with  control  information.  Port  j  can  be  in  one  of  the  two  states,  the 
enabled  state  or  the  disabled  state.  If  input  port  j  is  enabled,  and  bj  =  1  in 
CTRG  j,  for  some  fixed  i,  0  <  i  <  3,  then  the  data  from  D1  j  will  propagate  to 
DO  i.  It  is  possible  to  have  any  subset  of  bits  set  in  CTRG  j.  If  input  port  j  is 
disabled,  then  input  port  j  data  will  not  get  propagated  to  any  output  port.  A 
special  control  bit  b4  is  in  each  CTRG  for  fault  tolerance  reasons.  If  b4  =  1  in 
CTRG  j  then  input  port  j  +  1  modulo  4  is  disabled.  Thb  allows  a  PE  to 
“disconnect”  another  PE  which  is  faulty.  The  usage  of  the  b4  bit  is  discussed 
later.  There  is  no  need  for  contention  logic  since  the  SDMA  will  know  which 
destination  processors  are  available  (as  discussed  in  Section  9.1). 

In  Figure  9.10  the  data  path  for  the  output  port  i  (DO  i)  is  shown.  For 
reliability  reasons  each  gate  is  duplicated  by  a  parallel  gate  with  the  same  logic 
function.  This  method  will  protect  the  chip  from  an  open  gate  fault  (stuck 
low)  since  its  parallel  gate  can  carry  the  function  alone.  If  a  gate  output  is 
stuck  on  high  it  will  cause  loss  of  functionality  of  only  part  of  the  chip;  the 
closer  to  the  chip  output  that  the  gate  is,  the  larger  the  part  of  the  chip  that 
will  lose  its  functionality. 

Although  the  possibilities  to  recover  from  faults  are  many,  only  a  few  will 
be  discussed  here  to  illustrate  the  main  strong  points  of  the  design. 

(1)  Suppose  a  single  gate  in  the  crossbar  chip  is  stuck  at  low  in  the  data  path, 
then  the  error  will  not  exhibit  itself  because  of  the  gate  parallel  to  the 
faulty  one. 

(2)  Suppose  it  is  known  that  the  input  data  path  (external  to  the  chip)  is 
stuck  on  high,  then  the  control  of  that  port  will  load  CTRG  appropriately 


Note;  b|  memu  complemcDt  of  bit  i 


Figure  0.10: 

The  data  path  for  output  port  i  (DO  i). 


and  disconnect  the  data  path.  (Stuck  on  high  means  that  the  path  is 
stuck  in  such  way  that  the  DO  i  connected  to  thb  input  would  be  forced 
high.) 

(3)  Suppose  it  is  known  that  input  data  path  of  input  port  j  is  stuck  on  high 
and  also  the  control  logic  of  that  port  is  not  functioning.  Then  the 
processor  attached  to  input  port  j  — 1  modulo  4  can  use  its  disable  logic 
(b4)  to  dbable  the  faulty  port  j. 

(4)  If  the  input  path  b  stuck  at  low,  the  functionality  of  the  rest  of  the  chip 
will  not  be  impaired. 

If  the  combined  delay  from  the  output  of  the  SDMA  to  the  input  of  the 
DDMA  (see  Figure  0.4)  exceeds  the  desired  clock  cycle  time,  then  the  path  has 
to  be  broken  by  a  set  of  regbters,  one  per  port,  allowing  data  to  be  pipelined 
through  with  shorter  delays.  When  the  crossbar  chip  b  located  physically  near 
the  source  processors,  then  buffers  should  be  placed  at  the  output  of  the 
crossbar  chip  (on  the  chip  itself).  The  decbion  to  place  the  buffers  at  the 
outputs  of  the  crossbar  b  based  on  the  assumption  that  the  delay  from  an 
output  of  the  SDMA  to  an  output  of  the  crossbar  chip  b  one  half  of  the 
combined  delay  from  the  output  of  the  SDMA  to  the  input  of  the  DDMA.  In 
thb  system  the  combined  delay  b  short  therefore  there  b  no  need  to  break  the 
path. 

In  thb  section  the  architecture  of  type  n  crossbar  will  be  described.  The 
type  n  crossbar  (see  Figure  9.11)  b  very  similar  to  the  type  I  implementation 
with  exception  of  the  following.  The  CTI  i  and  DI  i  inputs  are  merged  into  a 
single  pin.  Thb  results  in  a  savings  of  Nxb  pins  for  an  NxNxb  crossbar.  The 
reliability  has  been  compromised  somewhat,  however,  because  if  DI  i  b  stuck 


i 


on  one,  the  CTRG  i  cannot  be  loaded  to  get  the  DI  i  off  the  output  bus  DO  j 
(if  DI  i  is  connected  to  DO  j).  However,  it  is  still  possible  to  get  DI  i  off  the 
output  bus  DO  j  by  using  the  b4  bit  of  CTRG  i- 1  modulo  4. 

The  number  of  functional  pins  in  a  type  n  chip  is  as  follows.  Input  port  i: 
two  pins  (DI  i  and  CTK  i).  Output  port  j:  one  pin  (DO  j).  Reset  one  pin 
(RES).  For  a  4x4x1  crossbar  the  total  number  of  signal  (control  and  data) 
lines  (not  including  RES)  is  3x4  =  12  (3xN  for  an  NxNxl  crossbar).  For  an 
8x8x1  crossbar  the  total  number  of  signal  lines  is  3x8=24.-  Similar  to  the 
analysis  for  a  type  I  crossbar  design,  assuming  there  can  be  up  to  80  signal  pins 
on  a  chip,  a  4x4x6  or  8x8x3  crossbar  can  be  constructed. 


0.7  Network  Architectures 


Several  different  network  architectures  and  their  implementations  using 
type  I  or  type  II  crossbar  chips  will  be  presented  in  thb  section.  Each  scheme 
has  sufficient  throughput. 

Each  scheme  will  be  evaluated  using  the  following  criteria. 

(1)  Types  of  interconnection  functions  admissible. 

(2)  Number  of  chips. 

(3)  Cost  of  connections  between  the  chips  of  the  network. 

(4)  Fault  detection  (hard  faults). 


(5)  Fault  recovery. 

(6)  Extendibility  to  a  larger  number  of  processors. 

(7)  Extendibility  to  larger  bandwidth. 

Although  the  required  interconnection  functions  demand  only  a  4x4 
crossbar,  for  the  following  reliability  reasons  an  8x8  crossbar  will  be  used. 
Suppose  two  PEs  fail  or  the  paths  to  them  fail  in  a  single  group  j  (of  size  four). 
Thu  would  cause  the  load  on  the  two  remaining  PEs  to  double.  Using  an  8x8 
crossbar  it  is  possible  to  allocate  one  PE  from  group  j  +  1  and  thereby  balance 
the  load  over  two  groups  (and  their  associated  PEs). 

The  DMA  network  port  consists  of  four  bits  which  provides  sufficient 
bandwidth  (4  Mbyte/sec/PE)  to  meet  the  specifications  in  Section  0.3.  This 
can  be  calculated  as  follows.  Each  PE  has  four-bit  wide  bus.  Based  upon  the 
longest  distance  of  the  connections  between  source  and  destination  PEs  (»  5-10 
ft.)  a  single  wire  can  transfer  approximately  1  Mbyte/sec.  A  bus  width  of  four 
bits  allows  4  Mbytes/sec.  Now,  consider  the  swinging  buffer  memory 
bandwidth.  Since  the  output  memories  are  capable  of  reading  2  bytes/100  ns 
(at  20  Mbyte/sec.)  the  memories,  too,  have  sufficient  bandwidth.  The  above 
calculation  shows  that  each  PE  has  available  a  network  (and  memory) 
bandwidth  of  up  to  4  Mbyte/sec/PE,  which  satisfies  the  requirements  for  both 
Phases  A  and  B. 

Consider  scheme  1  shown  in  Figure  9.12. 

(1)  Interconnection  functions  admissible:  The  functions  admissible  are  the 
full  crossbar  functions. 

(2)  Number  of  chips  required:  Using  chip  type  I  or  type  II  two  chips  are 


206 


(3)  Cost  of  connections  between  chips  of  the  network:  Not  applicable. 

(4)  Fault  detection:  The  header  method  and  diagnostics  will  detect  multiple 
faults  of  the  data  path  and  also  faults  in  the  control,  e.g.,  routing  to  an 
incorrect  destination  processor.  (For  more  details  see  Section  9.8  on  fault 
detection  and  recovery.) 

(5)  Fault  recovery: 

(a)  If  a  bit  path  is  broken  either  in  the  wires  or  on  the  chip,  then  the 
SDMA  will  reformat  the  network  word  and  send  it  over  the  other 
correctly  working  bit  paths.  The  DDMA  will  then  deformat  the 
network  word  into  the  system  16-bit  word. 

(b)  If  the  control  of  a  single  bit  path  is  not  functioning,  the  fault  will  be 
handled  as  if  the  bit  path  is  broken. 

(6)  Extendibility  to  a  larger  number  of  processors:  Since  the  required 
interconnection  functions  can  be  partitioned  (restricted)  to  groups  of  four 
processors,  the  scheme  is  easily  extendible.  Extension  of  the  network  can 
be  accomplished  by  adding  a  complete  interconnection  network  for  each 
additional  two  source  and  destination  groups  (eight  source  processors  and 
eight  destination  processors). 

(7)  Extendibility  to  a  larger  bandwidth:  Since  the  bandwidth  is  limited  by 
the  number  of  wires  per  port,  the  extension  simply  involves  increasing  the 
number  of  wires  per  port  and  also  the  number  of  bit  slices  of  the 
network.  (This  can  be  done  up  to  the  limit  imposed  by  the  swinging 
buffer  bandwidth.) 

Consider  scheme  2,  shown  in  Figure  0.13.  This  system  consists  of  two 
complete  networks  in  parallel.  If  there  are  no  faults,  only  one  of  these 


2( 


I 


networks  is  used.  The  outputs  from  the  two  networks  are  either  selected  by  a 
multiplexer  (with  each  bit  path  controlled  independently)  or  by  the  tri-state 
logic  inside  the  chips  themselves.  It  is  assumed  that  faults  in  either  chip  can  be 
contained  and  will  not  affect  the  other  chip. 

(1)  Types  of  interconnection  functions  admissible:  Same  as  scheme  1. 

(2)  Number  of  chips  required:  Using  chip  type  I  or  type  11  four  chips  are 
required. 

(3)  Cost  of  connections  between  the  chips  of  the  network:  Connections  are 
simple. 

(4)  Fault  detection:  Same  as  scheme  1. 

(5)  Fault  recovery: 

(a)  Same  as  5(a)  for  scheme  I. 

(b)  If  a  bit  path  is  broken  inside  one  of  the  chips,  then  using  the 
multiplexer  (or  tri-state  control)  the  corresponding  functioning  bit 
path  from  the  other  network  will  be  substituted. 

(c)  If  the  control  for  a  single  bit  path  is  not  functioning,  use  the 
substitution  as  in  (b). 

(6)  Extendibility  to  a  larger  number  of  processors:  Same  as  scheme  1. 

(7)  Extendibility  to  larger  bandwidth:  Within  a  single  network  the  same 
arguments  as  for  scheme  1  hold. 

Consider  scheme  3,  shown  in  Figure  9.14.  The  first  (closest  to  the  SDMA) 
part  of  the  total  network  will  be  referred  to  as  the  front  network.  The  second 
(closest  to  the  DDMA)  part  of  the  network  will  be  referred  to  as  the  rear 
network.  The  output  port  i  of  the  front  network  and  input  port  i  of  the  rear 
network  will  be  referred  to  as  intermediate  port  i.  If  the  assumption  that  long 


wires  are  more  susceptible  to  faults  than  short  wires  holds,  then  this  scheme 
has  some  advantages. 

(1)  Types  of  interconnection  functions  admissible:  Same  as  scheme  1. 

(2)  Number  of  chips;  Using  chip  type  I  or  type  II  four  chips  are  required. 

(3)  Cost  of  connections  between  the  chips  of  the  network:  Connections  are 
simple. 

(4)  Fault  detection;  Same  as  scheme  1. 

(5)  Fault  recovery:  All  techniques  presented  for  scheme  1  can  be  used  in 
addition  to  the  following.  Suppose  source  PE  i  wants  to  transmit  to 
destination  PE  j.  If  a  bit  path  is  broken  in  the  middle  wire  of  port  j  it  is 
possible  to  send  data  over  the  middle  wires  of  intermediate  port  k  /  j 
and  then  use  the  rear  network  crossbar  to  move  the  data  from  port  k  to 
output  port  j.  Depending  on  the  percent  utilization  of  the  paths,  this 
may  make  system  degradation  negligible. 

In  this  paragraph  the  scheme  4  will  be  described.  It  is  possible  to  combine 
schemes  2  and  3  and  get  the  benefits  of  both  schemes.  It  will  however  involve 
four  times  more  hardware  than  absolutely  necessary  from  a  connectivity  and 
throughput  point  of  view. 

In  this  paragraph  the  network  architecture  for  phase  B  will  be  presented. 
To  construct  the  network  for  a  system  consisting  of  16  source  PEs  and  16 
destination  PEs,  the  schemes  1  through  4  can  be  used  as  follows.  For  each  set 
of  eight  source  PEs  together  with  eight  destination  PEs  construct  an 
independent  network.  That  means  that  for  phase  B  (16  source  PEs,  16 
destination  PEs),  there  is  one  8x8  network  for  source  PEs  0*7  communicating 
with  destination  PEs  0-7  and  there  is  another  independent  8x8  network  for 


source  PEs  8-15  communicating  with  destination  PEs  8-15. 


I 

I 


9.8  Fault  Detection  and  Recovery 


Three  techniques  for  fault  detection  can  be  used:  (1)  parity  generation  and 
checking,  (2)  system  run  diagnostics,  and  (3)  block  header  generation  and 
checking  during  the  normal  mode  of  operation.  Consider  the  latter  two  in 
more  detail. 

For  system  run  diagnostics,  the  SDMA  of  processor  i  will  either  generate 
(or  use  prestored}  test  patterns  to  test  all  the  bit  paths  of  the  network.  It  will 
send  the  patterns  to  all  the  destinations  within  the  group  and  thereby  test  the 
data  paths  and  controls  of  the  network.  The  message  will  have  the  following 
format.  At  the  beginning  and  the  end  of  the  block  there  will  be  a  header 
containing  the  source  held,  destination  field,  opcode  field,  and  block  count. 
Some  header  formats  and  dialog  techniques  are  dbcussed  in  [ThC83].  The 
scheme  presented  here  is  an  augmented  version  of  these  formats  for  increased 
ease  of  fault  detection.  The  opcode  will  say  which  diagnostic  is  being  run. 
That  will  notify  the  DDMA  for  what  it  should  specifically  test.  Some  test 
patterns  may  follow  the  header,  depending  upon  the  particular  diagnostic.  The 
DDMA  will  analyze  the  header’s  destination  field  to  check  the  control  of  the 
network.  The  DDMA  will  then  send  the  error  report  to  the  SDMA.  This  is 
done  through  the  report  network  (see  Figure  0.4.) 


The  report  network  is  an  independent  network  used  by  the  destination 
PEs  to  return  status  and  error  reports,  or  any  information  that  the  diagnostic 
routine  requests.  While  the  necessary  bandwidth  b  low,  for  reliability  reasons 
it  should  consbt  of  at  least  four  one-bit  slices.  Architecturally  it  b  identical  to 
the  data  network  (that  b,  an  8x8  crossbar).  It  b  important  that  the  SDMA 
originating  the  diagnostic  gets  the  error  report  even  if  the  report  network  b  not 
completely  operational.  Thb  will  be  accomplbhed  by  trading  throughput  for 
redundancy  in  the  information.  Basically  the  error  report  will  be  sent  serially 
over  each  of  the  bit  paths  belonging  to  the  particular  port  being  tested.  For 
the  error  report  to  get  back  to  the  testing  SDMA  it  b  then  sufficient  if  only  one 
bit  path  in  the  report  network  is  non-faulty.  (The  SDMA  will  analyze  the 
header  of  the  report  message  sent  by  the  DDMA  and  check  it  for  correctness  in 
a  way  similar  to  that  used  by  the  DDMA  to  check  the  header  of  the  data 
message.)  The  error  report  itself  should  be  encoded  by  multiple  error  correcting 
code,  because  soft  errors  in  the  error  report  could  have  catastrophic 
consequences.  The  reason  why  it  b  important  for  the  testing  SDMA  to  receive 
the  error  report  is  that  it  can  then  make  the  best  decbion  about  which 
hardware  is  faulty  and  should  not  be  used.  The  more  information  that  is 
available  to  the  testing  SDMA  the  less,  but  sufficient,  amount  of  hardware  will 
have  to  be  reconfigured.  The  major  philosophy  here  b  that  the  detection  of 
faults  in  the  network  as  well  as  subsequent  reconfiguration  (discussed  in  the 
next  section)  is  done  locally,  independent  of  the  system  monitor.  The  exact 
description  of  the  error  will  be  assembled  and  broadcasted  to  all  the  source  PEs 
by  the  DDMA.  For  example,  if  destination  j  has  bit  path  k  broken,  all  the 
source  PEs  when  sending  data  to  the  destination  j  will  format  their  data  in 
such  a  way  as  not  to  use  bit  path  k.  This  describes  only  the  Qavor  of  possible 


213 


diagnostics  and  many  more  are  possible.  Further  research  is  required  in  this 
area. 

Block  header  generation  and  checking  during  normal  mode  of  operation 
can  be  implemented  as  follows.  Each  block  during  a  normal  transmission 
dialog  will  contain  a  header  of  form  (i,j,k,l,m),  where 

i  is  the  source  PE  address, 
j  is  the  destination  PE  address, 

k  is  the  number  of  this  block  within  the  current  transmission  dialog, 

1  is  the  operation  to  be  performed  by  the  destination  PE  on  the  data, 
m  is  the  multiple  error  correction  code  on  the  header. 

First,  the  SDMA  sets  up  the  path  in  the  network  to  the  proper  destination. 
Then  the  header  will  be  sent  on  each  of  the  bit  paths  at  the  source  port  to  the 
DDMA.  The  DDMA  will  receive  the  header  (actually  multiple  headers,  one  on 
each  bit  path).  Trivially,  the  DDMA  will  discover  any  broken  bit  path.  It  will 
abo  discover  any  faulty  network  controb  by  examining  the  destination  field.  If 
the  network  is  implemented  as  independent  slices,  it  b  possible  that  only  some 
of  the  bit  paths  have  bad  control  which  will  be  dbcovered  by  the  destination 
field.  The  block  number  can  be  used  as  follows.  The  DDMA  maintains  the 
last  received  block  count  in  a  register.  By  comparing  the  register  with  the 
incoming  block  number,  it  will  discover  faults  such  as  lost  blocks.  The  headers 
have  to  be  soft  error  protected  since  they  .carry  important  information.  The 
headers  will  be  resent  at  the  end  of  the  block.  If  received  correctly  then,  it  will 
be  assumed  that  data  was  transmitted  correctly  with  the  exception  of  soft 
errors  on  the  data  which  will  be  ignored  and  treated  as  additive  noise. 


214 


In  this  section  several  fault  recovery  techniques  will  be  discussed.  Some  of 
the  techniques  may  be  applicable  to  only  some  network  architectures  and/or 
implementations.  The  possible  hard  faults  can  be  classified  as  follows. 

(1)  Bit  path  in  an  input  or  output  wire  breaks. 

(2)  Bit  paths  inside  the  network  chip  breaks. 

(3)  Bit  path  in  a  middle  wire  breaks 

(4)  Bit  path  inside  the  network  chip  is  stuck  on  high  or  low. 

(5)  The  control  of  some  but  not  all  bit  lines  (of  a  single  path)  are  faulty  and 
the  destination  port  is  not  receiving  all  of  its  bits. 

(6)  The  control  of  all  the  bit  lines  (of  a  single  path)  are  faulty  and  the 
destination  port  is  (a)  not  receiving  any  data  or  (b)  receiving  data 
destined  for  another  processor. 

(7)  PE  fault. 

It  can  be  seen  in  the  section  on  fault  detection  (Section  0.8)  that  any  of  these 
faults  are  detectable  by  the  header  and  status  report  during  normal  operation. 
The  question  of  how  to  reconfigure  the  network  will  depend  upon  the  network 
architecture.  For  more  detaib,  see  the  section  on  network  architectures 
(Section  0.7). 

When  a  fault  occurs,  it  will  be  discovered  by  the  DDMA  at  the  next  block 
transmission.  The  DDMA  will  then  send  an  error  report  to  the  SDMA.  The 
SDMA  will  start  diagnostic  routines  to  evaluate  the  exact  nature  of  the  fault 
(for  example,  a  faulty  bit  path).  The  source  and  DDMA  will  then  reconfigure 
their  hardware  (for  example,  format  the  network  word  to  skip  the  faulty  bit 
path).  At  this  time  the  SDMA  will  also  notify  the  system  monitor  about  the 
new  reconfiguration.  The  system  monitor  does  not  have  to  be  involved  in  the 


•.-■.y.’-.y.y.;.-:;.- ;  - . 


.-**.’**  •w' 


215 


network  reconfiguration,  it  will  just  notify  the  operator  that  it  occurred. 


0.0  Conclusions 


For  this  application,  and  given  current  and  near  future  technology,  a 
crossbar  based  interconnection  network  is  very  well-suited  to  the  task  under 
consideration.  Two  different  fault  tolerant  chip  architectures  were  presented. 
Four  network  architectures  were  designed  and  their  characteristics  described. 
Several  fault  detection  and  recovery  techniques  on  the  system  level  were  shown, 
since  the  fault  tolerance  is  a  salient  issue  of  this  system. 


10.1  Introdoetioa 


i 

i 

1 

I 

Parallel  computation  is  one  way  to  take  advantage  of  the  low-cost 
processing  power  made  possible  by  VLSI  technology.  The  SIMD  mode  of 
parallelism  has  been  successfully  exploited  in  a  number  of  problem  domains.  A 
critical  architectural  feature  of  a  large-scale  SIMD  system  is  the  interconnection 
network.  A  variety  of  networks  have  been  proposed  and  analyzed  [Sie79a]. 
The  choice  of  which  network  to  implement  in  a  system  is  a  function  of  factors 
such  as  the  intended  computational  environment  (i.e.,  task  domain)  for  the 
system,  construction  time  and  cost  constraints  for  building  the  system,  and  the 
capabilities  of  the  interconnection  networks.  One  of  the  ways  in  which  to 
measure  the  capabilities  of  a  network  b  to  examine  its  ability  to  do  different 
data  permutations.  Here,  the  abilities  of  two  single  stage  networks  to  perform 
the  “shuffle”  data  movement  are  evaluated. 

Thb  paper  extends  SIMD  interconnection  network  studies  presented  in 
[Sie77,  Sie70b].  In  particular,  the  ability  of  the  PM2I  and  Illiac  single  stage 
SIMD  machine  interconnection  networks  to  perform  the  shuffle  interconnection 
b  examined.  Two  algorithms  for  an  SIMD  or  multiple-SIMD  machine  with  the 
PM2I  network  to  perform  the  shuffle  are  given.  One  algorithm  b  used  in  the 
event  that  the  SIMD  machine  is  of  the  same  size  (in  terms  of  number  of 
processors)  as  the  shuffle  to  be  emulated. .  The  other  algorithm  b  used  when 
the  shuffle  to  be  performed  b  of  smaller  size  than  the  given  machine  with  the 
PM2I  network.  It  is  proven  that  both  algorithms  require  only  one  more 
network  transfer  than  the  previously  published  lower  bound  (which  b  log2S  for 
a  shuffle  on  S  elements  [Sie77]).  The  PM2I  algorithm  b  used  as  basb  for  an 


algorithm  to  do  the  shuffle  with  the  Illiac  network  in  (2V^)-1  transfers.  A 
lower  bound  of  2>/^  —  4  on  the  emulation  of  the  shuffle  using  the  Illiac 
network  (and  a  different  algorithm  to  perform  the  emulation)  is  presented  in 
[NaS80]. 


10.2  Overview 

In  Section  10.3  the  basic  concepts  are  presented.  In  Section  10.4  an 
overview  of  the  interconnection  networks  Illiac,  PM2I  and,  Shuffle-Exchange  is 
given.  In  Section  10.5  two  algorithms  of  PM2I  performing  the  shuffle  are 
developed  as  well  as  proven  correct.  Thb  is  used  as  a  basis  for  the  algorithm 
for  performing  the  shuffle  with  the  Illiac  network  which  is  presented  in  Section 
10.6.  In  Section  10.7  the  conclusions  are  presented. 


10.3  SIMD  Machines 

Typically,  an  SIMD  (single  instruction  stream  •  multiple  data  stream) 
machine  (FIy66]  is  a  computer  system  consisting  of  a  control  unit,  N  processors, 
N  memory  modules,  and  an  interconnection  network  (e.g.  Illiac  IV  [BoD72]). 
The  control  unit  broadcasts  instructions  to  the  processors,  and  all  active 


processors  execute  the  same  instruction  at  the  same  time.  Each  active 
processor  executes  the  instruction  on  data  in  its  own  memory  module.  The 
interconnection  network  provides  for  communications  among  the  processors 
and  memory  modules.  A  multiple  SIMD  system  is  a  parallel  processing  system 
which  can  be  structured  as  one  or  more  independent  SIMD  machines,  each  with 
its  own  control  unit  (e.g.  MAP  (Nut77]). 

One  way  to  configure  an  SIMD  machine  is  as  a  set  of  N  processing 
elements  (PEs)  interconnected  by  a  network,  where  each  PE  consists  of  a 
processor  with  its  own  memory.  This  is  shown  in  Figure  10.1  and  is  called  the 
P&to>PE  organization.  An  alternative  organization  is  to  position  the  network 
between  the  processors  and  the  memories.  The  PE-to-PE  paradigm  will  be 
assumed,  however,  the  results  presented  will  be  applicable  to  the  other 
organization  also. 

The  model  of  an  SIMD  machine  presented  in  [Sie70b]  b  used  here.  The 
assumptions  made  about  the  SIMD  machine  to  be  used  as  the  model  are 
intentionally  minimal  so  that  the  material  presented  is  applicable  to  a  wide 
range  of  machines. 

There  are  N  PEs,  addressed  (numbered)  from  0  to  N-1,  where  N  =  2™.  It 
is  assumed  that  the  processor  contains  a  fast  access  general  purpose  register  A 
and  a  data  transfer  register  (DTR).  When  data  transfers  among  PEs  occur,  it 
is  the  DTR  contents  of  each  PE  that  are  transferred.  The  notation 
"A  ♦-  DTR  ”  means  the  contents  of  the  DTR  are  copied  into  the  A  register. 
The  notation  “A  * — ►  DTR  ”  means  the  two  registers  exchange  their  contents. 

The  PE  address  masking  scheme  uses  an  m-position  mask  to  specify  which 
PEs  are  to  be  activated  [Sie77].  Each  position  of  the  mask  will  contain  either  a 
0,  1,  or  X  (“don’t  care”).  The  only  PEs  that  will  be  active  are  those  that 


INTERCONNECTION  NETWORK 


Figure  10.1: 

PEI-toPE  SIMD  machine  configuration,  with  N  PEs. 


match  the  mask  in  each  position:  0  matches  0,  1  matches  1,  and  X  matches  0 
or  1.  For  example,  if  N  =  8  and  the  mask  is  IXO,  then  only  PEs  6  =  110  and 
4  =  100  are  active.  Superscripts  are  used  as  repetition  factors,  e.g.,  X^Ol^  is 
XXXOll.  Square  brackets  will  be  used  to  denote  a  mask.  Each  PE  instruction 
and  interconnection  function  (defined  below)  will  be  accompanied  by  a  mask 
specifying  which  PEs  will  execute  that  command. 

An  intereonnection  network  can  be  described  by  a  set  of  interconnection 
functions,  where  each  interconnection  function  is  a  bijection  (permutation)  on 
the  set  of  PE  addresses  [Sie77].  When  an  interconnection  function  f  is  applied, 
PE  i  sends  the  contents  of  its  DTR  to  the  DTR  of  PE  f(i).  This  occurs  for  all  i 
simultaneously,  for  0  <  i  <  N  and  PE  i  active.  Saying  that  an  interconnection 
function  is  a  bijection  means  that  every  PE  sends  data  to  exactly  one  PE,  and 
every  PE  receives  data  from  exactly  one  PE  (assuming  all  PEs  are  active).  In 
this  model,  it  is  assumed  that  an  inactive  PE  can  receive  data,  but  cannot  send 
data.  To  pass  data  from  one  PE  to  another  PE  a  programmed  sequence  of  one 
or  more  interconnection  functions  must  be  executed,  moving  the  data  by  a 
single  transfer  or  by  passing  the  data  through  intermediary  PEs.  Since  there  is 
a  single  instruction  stream  in  an  SIMD  machine,  all  active  PEs  must  use  the 
same  interconnection  function  (connection)  at  the  same  time. 


Figure  10.2: 

Illiac  network  for  N  =  16.  (The*  actual  lUiac  IV  SIMD 
machine  had  N  =  64).  Vertical  lines  are 
+  \/N  and  -  y/N.  Horizontal  lines  are  +1  and  -1. 


Figure  10.3: 

PM2I  network  for  N  =  8.  (a)  PM2.t.0  connections 
(b)  PM2.f.|  connections,  (c)  PM2.f2  connections. 
For  the  PM2_i  connections,  0  <  i  <  2, 
reverse  the  direction  of  the  arrows. 


multistage  networks.  Various  properties  of  the  PM2I  are  discussed  in  [FiF82, 
PrK80,  Sie77,  Sie7flb,  SieSOl- 


The  Shuffle-Exchange  network  consists  of  the  shuffle  interconnection 
function  and  the  exchange  interconnection  function; 

shuffle(p„_iP„_2..PiPo)  =  Pm-2Pm-3-PlPoPm-l 
exchange(p„_,Pm_2...piPo)  =  Pm-iPm-2- -PiPo- 
For  example,  shuffle(3)  =  6  and  exchange(6)  =  7,  for  N  >  8.  Thb  network  is 
shown  in  Figure  10.4  for  N  =  8.  The  shuffle  is  also  included  in  the  networks  of 
the  Omen  (Hig72]  and  RAP  {CoG74]  systems.  The  multistage  omega  network 
is  a  series  of  m  Shuffle-Exchanges  [Law75).  Features  of  the  Shuffle-Exchange 
are  discussed  in  [ChL8l,  FiF82,  Lan76,  LaS76,  NaS81,  NaS82,  PrK80,  Sie77, 
Sie79b,  Sie80,  Sto71,  WuFSl). 

The  ability  of  each  of  the  PM2I  and  Illiac  networks  to  perform  the 
exchange  function  in  just  two  transfers  was  presented  in  [Sie79b).  Thus,  the 
algorithms  given  here  for  performing  the  shuffle  can  be  used  to  allow  either  the 
PM2I  or  Illiac  network  to  emulate  the  Shuffle-Exchange  network. 


10.6  ShuflSlng  with  the  PM2I  Network 


In  this  section  the  use  of  the  PM2l  network  to  perform  the  shuffle  will  be 
examined.  Two  algorithms  for  an  SIMD  or  multiple-SIMD  machine  with  the 
PM2I  network  to  perform  the  shuffle  are  given.  One  algorithm,  presented  in 
this  section,  is  used  in  the  event  that  the  SIMD  machine  is  of  the  same  size  (in 


terms  of  number  of  PEs)  as  the  shuf&e  to  be  emulated.  The  other  algorithm 
described  is  used  when  the  shuffle  to  be  performed  is  of  smaller  size  than  the 
given  machine  with  the  PM2I  network.  If  the  shuffle  is  of  size  S  (in  terms  of 
number  of  PEs)  then  it  was  shown  previously  in  {Sie77]  that  the  lower  bound  of 
the  algorithm  for  the  PM2I  to  emulate  the  shuffle  requires  log2S  network 
transfers.  It  is  proven  here  that  both  algorithms  require  only  log2S  +  1 
network  transfers. 

In  this  section  an  algorithm  to  perform  the  shuffle  with  a  PM2I  of  the 
same  size  will  be  developed.  This  algorithm  applies  to  the  case  where  the 
machine  with  the  PM2I  network  is  of  the  same  size  in  terms  of  the  number  of 
processors  as  the  shuffle  to  be  emulated.  The  following  ground  rules  will  be 
used  in  the  design  and  analysis  of  the  algorithm. 

(1)  The  model  and  definitions  presented  in  Sections  10.3  and  10.4  will  be  the 
formal  basis  for  the  results. 

(2)  When  simulating  the  shuffle,  the  data  that  is  originally  the  DTK  of  PE  P 
must  be  transferred  to  the  DTR  of  PE  shuffle(P),  for  all  P,  0  <  P  <  N. 

(3)  The  time  for  each  algorithm  b  in  terms  of  the  number  of  executions  of 
interconnection  functions  required  to  perform  the  simulation. 

The  reason  for  (3)  can  be  seen  by  considering  the  way  in  which  various 
instructions  can  be  implemented.  The  instructions  in  the  algorithm  can  be 
divided  into  three  categories:  control  unit  operations  (in  C),  register  to  register 
operations  (in  I),  and  inter-PE  data  transfers  (in  F).  Control  unit  operations, 
such  as  incrementing  a  count  register  in  the  control  unit  for  a  “for  loop,”  can, 
in  general,  be  done  in  parallel  (overlapped)  with  the  previously  broadcast  PE 
instruction,  thus  taking  no  additional  time.  Register  to  register  operations 


228 


within  a  PE  will  probably  involve  a  single  chip  or,  at  worst,  physically  adjacent 
chips.  The  inter-PE  data  transfers  will  involve  setting  the  controls  of  the 
interconnection  network  and  passing  data  among  the  PEs,  involving  board  to 
board,  and  probably  rack  to  rack,  distances.  Thus,  unless  the  number  of 
register  to  register  operations  is  much  greater  than  the  number  of  inter-PE 
data  transfers,  the  time  for  the  inter-PE  transfers  will  be  the  dominating  factor 
in  determining  the  execution  time  of  the  algorithm. 

In  the  algorithm  below  indicates  a  comment.  When  discussing  the 
algorithm,  “Li”  is  used  as  an  abbreviation  for  “statement  i  of  the  algorithm.” 
For  j  =  0,  =  X®  where  ”X®”  is  the  null  string,  i.e.,  no  ”X”s. 

To  understand  the  concept  underlying  the  algorithm  to  perform  the 
shuffle,  consider  the  ” distance”  the  shuffle  moves  a  data  item.  The  data  item 
originally  in  the  DTR  of  PE  P,  0  <  P  <N/2,  b  moved  to  shuffle(P)  =  2P,  a 
dbtance  of  shuffle(P)  -  P  =  P.  The  data  item  originally  in  the  DTR  of  PE  P, 
N/2  <  P  <  N,  b  moved  to  shuffle(P)  =  2P  +  1  mod  N,  a  dbtance  of 
shuffle(P)  -  P  =  P  +  1.  Thb  b  shown  in  Table  10.1  for  N  =  8. 

Specifically,  data  originally  in  PE  P,  0  <  P  <  N,  with  pj  =  1  b  moved  by 
PM2+i  (i  =  0,  1,  ...,  m  -  1)  to  PE  2P  mod  N.  If  N/2  <  P  <  N  then  in 
addition  to  the  previous  move  the  data  will  be  moved  + 1  by  PM2  +o  to  2P  + 
1.  Thb  is  abo  shown  for  N  =  8  in  Table  10.1. 

The  difficulty  in  designing  a  parallel  algorithm  for  thb  task  arbes  from  the 
need  to  keep  track  of  the  fiow  of  N  data  items  among  the  N  PEs.  Note  that 
Table  10.1  does  not  show  the  intermediate  PEs  through  which  the  data  b 
passed.  For  example,  for  N  =  8  after  executing  PM2.t.o  originally  in 

PEs  4  and  5  will  both  be  in  PE  5. 


" 


vlvlIvI-v'S' 


220 


Table  10.1: 

The  idea  underlying  the  algorithm  for  the  PM2I 
to  perform  the  shuffle,  shown  for  N  =  8. 


origin  distance  distance 

PE  moved  moved 

number  by  shuffle  by  PM2I 


QBRQI 

m 

— 

— 

D 

1  =  001 

mm 

+  1 

- 

- 

- 

D 

2  =  010 

K9 

- 

+2 

- 

- 

19 

3  =  Oil 

■9 

+  1 

+2 

— 

- 

19 

4  =  100 

+  5 

- 

— 

+4 

+  1 

+5 

5  =  101 

+6 

-n 

— 

+  4 

+  1 

+6 

6  =  no 

+  7 

- 

+2 

+  4 

+  1 

+7 

7  =  111 

+0 

+1 

+2 

+4 

+  1 

+0 

In  the  algorithm  below,  during  steps  L3  to  LS,  for  1  <  j  <  m— 1,  all  of  the 
data  of  interest  are  in  even  numbered  PEs.  After  L5  has  been  executed  for 
j  =  m-1,  the  data  from  PE  P,  0  <  P  <  N,  has  been  moved  to  PE  2P  mod  N 
by  using  a  subset  of  PM2^o,  PM2  +  |,  PM2+,n_i,  in  that  order.  For 

N/2  <  P  <  N,  L6  executes  PM2+o  to  move  data  from  PE  2P  to  2P  +  1. 
Algorithm  to  perform  the  shuffle  with  a  PM2I  network  of  the  same  size: 

(LI)  A  4-  DTK  pC""*0| 

:even  PEs  save  DTR  contents  in  A  register 
(L2)  PM2+0  pC"'‘l] 

:odd  PEs  send  DTR  data  “  +  1”  to  even  PEs 
(L3)  for  j  =  1  until  m-1  do 
begin 

(L4)  A  DTR 

:even  PEs,  j-th  bit=l,  switch  A  and  DTR 
(LS)  PM2+J  pC™"‘Ol 

:even  PEs  send  DTR  data  “  +2^” 
end 

(L6)  PM2+0  P^"‘0l 

:half  of  data  sent  from  even  PEs  to  odd  PEs 
(L7)  DTR  -  A  pC"-'Ol 

ireload  DTR  from  A  register  in  even  PEs 


Thb  algorithm  used  m  +  1  inter-PE  data  transfers  and  m+1  register  to 
register  moves.  The  operation  of  this  algorithm  for  N  =  8  (m  =  3)  is  shown  in 
Table  10.2. 

For  example,  consider  the  data  item  initially  in  the  DTR  of  PE  5  (=  101).  PE 
5  does  not  match  the  mask  in  Ll  (pOCO]).  PE  5  does  match  the  mask  in  L2 
(pCXl])  and  the  data  is  moved  to  PE  PM2.t.o(5)  =  6  (=  110).  PE  6  does  match 
the  mask  in  L4  when  j  =  1  (pClO])  and  the  data  is  moved  to  the  A  register  of 
PE  0.  The  data  is  unaffected  by  L5  when  j  =  1  (since  it  is  not  in  the  DTR). 
PE  6  does  match  the  mask  in  L4  when  j  =  2  ([IXO])  and  the  data  is  moved  to 
the  DTR  of  PE  6.  PE  6  does  match  the  mask  in  L5  when  j  =  2  (pCXOj)  and 
the  data  is  moved  to  the  DTR  of  PE  PM2+2(6)  =  2.  PE  2  does  match  the 
mask  in  L6  (pOCOj)  and  the  data  is  moved  to  the  DTR  of  PE  PM2  4.0(2)  =  3. 
PE  3  does  not  match  the  mask  in  L7  (pOCO]).  Thus,  the  data  originally  from 
PE  5  is  moved  to  PE  3  =  shuffle(5).  This  is  shown  by  the  dotted  line  in  Table 
10.2. 

Proof  that  the  algorithm  is  correct: 

Assume  all  arithmetic  is  mod  N. 

The  induction  hypothesis  (proven  correct  below)  is  that  after  executing 
PM24.j  in  Ll  (for  j  =  0)  or  L5  (for  1  <  j  <  m)  the  data  originally  in  the  DTR 
of  PE  Q  =  Qm-i  ...qiqo  will  currently  be  in  PE  P  =  Pm-i-PiPo  = 
(qm-l"  <lj+2<lj-»-i)  *  (qj  -qiqo)  *  2.  (When  j  =0, 

P  =  (q^-i- -qaqi)  *  2  +  (qo)  *  2.)  The  data  will  be  in  the  A  register  if  qj  =  0 
and  in  the  DTR  if  qj  =  1. 

Thus,  when  j  =  m-1,  the  data  originally  from  PE  Q  is  in  PE 
(qm-i.-.qiqo)  *  2.  The  data  item  in  the  DTR  of  PE  (q,„-i...qiqo)  *  2  is  moved  to 
PE  (qm-i  -qiqo)  *  2  +  1  by  L8;  which  is  correct  since  this  data  item  is  from  a 


232 


Table  10.2: 

Example  of  the  algorithm  for  performing  the  shuffle 
using  the  PM2I  when  N  =  8. 

It  is  assumed  that  initially  the  DTR  of  PE  P 
contains  the  integer  P,  0  <  P  <  8. 

The  dotted  line  shows  the  movement  of  the  data  originally 
in  the  DTR  of  PE  5  (=  101). 


Initial 

DTR 


L2 

L4,j=l 

L4.j=l 

L5,  j=l 

DTR 

A 

DTR 

DTR 

Contents 

Contents 

Contents 

Contents 

111 

111 

no 

001 

001 

010 

111 

on 

100 

oil 

010 

101 

101 

no 

on 

- 

- 

• 

• 

PE 

L4,j=2 

A 

Contents 

L4,j=2 

DTR 

Contents 

L5J=2 

DTR 

Contents 

L6 

DTR 

Contents 

L7 

DTR 

Contents 

IS!I 

no 

100 

100 

001 

- 

- 

- 

100 

100 

010 

1 

001 

Ill 

101 

101 

001 

on 

- 

- 

- 

101 

101 

100 

010 

100 

no 

no 

010 

101 

- 

- 

-  • 

no 

no 

no 

on 

101 

111 

111 

on 

111 

. 

. 

111 

111 

233 


PE  where  qj  =  q„_|  =  1,  so  shuffle(Q)  =  2  *  Q  +  1.  The  data  item  in  the  A 
register  of  PE  (qm-i— Qiqo)  *  2  is  moved  to  the  DTR  of  that  PE  by  L7;  which  is 
correct  since  this  data  item  is  from  a  PE  where  qj  =  q„_|  =  0,  so  shuffle(Q)  = 
2  *  Q. 

To  complete  the  correctness  proof  it  must  be  shown  that  the  induction 
hypothesis  is  true.  Basis:  j  =  0. 

Case  1:  Consider  the  data  item  originally  in  the  DTR  of  PE 
Q  ~  ^  moved  to  the  A  register  of 

that  PE  by  LI.  Since  q©  =  0,  Q  =  (qm-i  -q2Qi)  *  2  + 

(qo)  *  2  =  P.  Thb  data  is  not  moved  by  L2.  It  remains  in  the  A 
register  and  q^  =  0.  Thus,  the  induction  hypothesis  is  true  for 
j  =  0  for  this  case. 

Case  2:  Consider  the  data  item  originally  in  the  DTR  of  PE 

Q  =  qg,_]...q2q|l.  This  data  item  is  not  moved  by  Ll.  It  is  moved 
to  the  DTR  of  PE  P  =  Q  +  1  by  PM2+o  in  L2.  Since  qo  =  1, 
Q  +  1  =  Qm-i- +  1  =  (qm-i  •‘bqi)  *2  +  2  = 

(q„_|...q2q|)  *  2  +  (q^)  *  2  =  P.  The  data  item  is  in  the  DTR  and 
qo  =  1.  Thus,  the  induction  hypothesis  is  true  for  j  =  0  for  this 
case. 

Induction  Step:  Assume  true  for  j  =  k  -  1  and  show  true  for  j  =  k. 

Case  1:  Consider  the  data  item  originally  in  the  DTR  of  PE 

Q  =  q,n_i...q2qiqo.  where  q^.,  =  0. 

From  the  induction  hypothesis  when  j  =  k-1,  this  data  item  is  in 
the  A  register  of  PE  P  =  p„_,...p|Po  =  (q,„-|...qk  +  iqk)  *  2*^  + 


ifir 


234 

Subcase  la:  =  1.  The  A  register  data  is  moved  to  the  DTR  of  PE  P 

by  L4  and  then  to  the  DTR  of  PE  P  +  2*^  by  L5.  Recall 

P  =  Pin-i- -PiPo  =  (qm-i"'<lk+i<lk)  •  2*'  +  (qk-i -qiqo)  *  2. 
Since  =  0,  (0qj(-2— qiqo)  *  2  <  2*^.  Thus,  if  =  1,  it 
must  be  that  qjj  =  1.  Since  qjj  =  1,  P  +  2*^  = 

(qm-i  -<ik+ii)  *  2*'  +  (qk-i -qiqo)  *2  +  2''  = 

(qm-1  -qk  +  i)  *  +  2*^  +  (qk-i -  qiqo)  *2  +  2*'  = 

(qm-i -qk+i)  *  2*'+*  +  (iqk-i  -  qiqo)  *  2  = 

(qm-i -qk+i)  *  2'''*‘‘  +  (qkqk-i  -qiq©)  *  2. 

Furthermore,  the  data  is  in  the  DTR  and  qj^  =  1.  Thus, 
the  induction  hypothesis  is  true  for  j  =  k  for  this  subcase. 

Subcase  lb;  Pi^  =  0.  The  A  register  data  is  kept  in  the  A  register  of  PE 
P  and  not  moved  by  L4  or  L5.  As  in  Subcase  la,  since 
qi,_i  =  0,  (Oqk-2—qiqo)  *  2  <  2*^.  Thus,  if  p|j  =  0,  it  must 
be  that  qk  =  0.  Since  qk  =  P  = 

-qk  +  iO)  *  2*'  +  (qk-i  -qiqo)  *  2  = 

(<iin-i -qk+i)  *  2’'’^'  +  (qk  -qiqo)  *  2. 


F urthermore,  the  data  is  in  the  A  register  and  qj^  =  0. 
Thus,  the  induction  hypothesis  is  true  for  j  =  k  for  this 
subcase. 

Case  2:  Consider  the  data  item  originally  :n  the  DTR  of  PE 
Q  =  qin-i  ■<li<lo.  where  q^.,  =  1. 

From  the  induction  hypothesis  when  j  =  k-1,  this  data  item  b  in 
the  DTR  of  PE 

P  =  Pm-i- -PiPo  =  (qm-i* -qk+iOk)  •  2*'  +  (qk-i- -qiqo)  *  2. 


^  I  ■.■N'.T.  ■.■»  '.■«  ’■'»t »  K  "^-rv »!!.">  -jr^j  -JI  -.n-jy  -.»T.»r^ '"J" T^ranrrrjwj *-.  r: 


235 


Subcase  2a:  p^  =  1.  The  DTR  data  b  moved  to  the  A  register  of  PE  P 

by  L4  and  is  not  moved  by  L5.  Recall  Pm-i  -  PiPo  ~ 
(<lm-i  -qk  +  iqk)  *  2''  +  (qk-i...qiqo)  *  2.  Since  q^-,  =  1, 
(qic-i  -qi^)  *2  =  2''  +  (qv_2 ...qiqo)  *  2.  Thus,  if  p,,  =  1,  it 
must  be  that  q^  =  0.  Since  qj^  =  0,  P  =  (q„_|...qjf^.|0)  *  2*' 

+  (qk-i  •<li<lo)  *  2 

=  (qm-l- -  ^Ik  +  l)  *  2''-''*  +  (qk-qiqo)  *  2. 

Furthermore,  the  data  b  in  the  A  regbter  and  qj^  =  0. 
Thus,  the  induction  hypothesis  b  true  for  j  =  k  for  this 
subcase. 

Subcase  2b:  P|j  =  0.  The  DTR  data  b  kept  in  the  DTR  of  PE  P  (not 

moved  by  L4).  It  is  then  moved  to  the  DTR  of  PE  P  +  2'' 
by  L5.  Since  q^-i  =  1,  (qjt-i  -Qi^)  *  2  = 

2*^  +  (qk-2  -^i%)  *  2.  Thus,  if  p^  =  0,  it  must  be  that 
q^  =  1.  Since  qk  =  I,  P  +  2^  =  (qm-i  -qk  +  l)  *  2'‘‘''*  + 
(Qk^k-i  -  Qi^o)  *  2  as  in  Subcase  la.  Furthermore,  the  data 
b  in  the  DTR  and  =  1.  Thus  the  induction  hypothesb  is 
true  for  j  =  k  for  thb  subcase. 

Thb  completes  the  proof  that  the  induction  hypothesb  b  true. 

No  data  of  interest  b  destroyed  by  the  inter-PE  data  transfers.  The 
transfer  in  L2  overwrites  no  relevant  data  since  such  data  b  saved  in  the  A 
regbters  in  LI.  The  transfers  in  L5,  for  1  <  j  <  m,  move  data  among  the 
even  numbered  PEs  (i.e.,  all  even  numbered  PEs  transfer  data  simultaneously) 
so  no  data  b  overwritten.  Finally,  the  transfer  in  L6  overwrites  data  in  the 
DTRs  of  the  odd  numbered  PEs,  however,  all  data  of  interest  are  in  even 


numbered  PEs  at  that  point. 

This  completes  the  correctness  proof.  All  the  data  items  have  been  moved 
as  the  shuffle  would  have  moved  them. 

In  this  section  an  algorithm  for  PM2I  emulating  shuffle  of  smaller  size  will 
be  developed.  This  algorithm  is  applicable  when  the  machine  with  the  PM2I 
network  is  larger  (in  terms  of  number  of  PEs)  then  the  shuffle  to  be  emulated. 
To  solve  this  problem  it  will  be  decomposed  into  several  subproblems. 

It  was  shown  in  [SieSO]  that  the  PM2I  network  can  be  partitioned  into 
independent  subnetworks.  There  are  some  constraints  on  how  this  can  be 
done.  Suppose  there  is  a  PM2I  network  of  size  N  =  2™  and  it  is  desired  to 
partition  the  network  into  groups  of  size  2'  (0  <  r  <  m).  Recall  that  the  PEs 
are  addressed  as  Pni-iPm-2»«Po-  ^  group  of  size  2'  all  PEs  in  the  group 

must  have  the  same  m-r  least  significant  bits.  That  means  that  for  each  group 
the  value  of  address  bit  positions  Pm-r-iPm-r-z— Po  unique.  Denote 

Pin-r-lPin-r-2*“Po 

This  group  (identified  uniquely  by  its  value  of  B)  then  constitutes  a  logical 
PM2I  network  of  size  2',  with  the  PEs  logically  numbered  from  0  to  2'-l  by 
the  r  high  order  bits  of  their  physical  address.  Each  logical  function  PM2.(.j 
will  be  executed  by  the  physical  function  PM2.(.j 

The  previous  algorithm  for  the  PM2I  network  to  emulate  the  shuffle  will 
be  mapped  into  the  logical  PM21  network  of  size  2'.  Thb  can  be  implemented 
as  follows. 

(a)  Let  the  logical  PE  addresses  in  a  set  of  size  2'  be  denoted  as 

{Q}  =  {qr-i<ir-2 -  qol  qj  =  o.i}- 

Let  the  physical  PE  addresses  in  a  set  of  size  2"*  be  denoted  as 


{P}  =  {Pn.-lPm-2. -Pol  Pi  =  0,1). 

Define  a  map  from  the  logical  PE  address  set  {Q}  into  the  group  B  of 
the  physical  PE  address  set  {P}  as  follows: 

{Q}->{P} 

(b)  Map  the  logical  function  set  PM2^.j  into  the  physical  function  set  as 
follows: 


tl> :  {PM2+i}  {PM2+k} 

V>(PM2+j)  PM2+j +(„_,)  where  0  <  j  <  r  . 

Algorithm  for  a  PM2I  of  size  2*"  to  emulate  a  shuffle  of  size  2'  (1  <  r  <  m): 

(LI)  A  DTR  pC-‘0B] 

:logical  even  PEs  save  DTR  data  in  A  register. 

(L2)  PM2t(„-,,  PC^'lBI 

:logical  odd  PEs  send  DTR  data  logical  ”  + 1”  to  logical  even 
PEs 

(L3)  for  j  =  1  until  r-1  do 
begin 

(L4)  a  DTR  (X^i"‘lX^‘'0Bl 

:logical  even  PEs,  logical  j-th  bit  =  1,  switch  A  and  DTR 

(L5)  PM2ti+(„-„  |X'-‘0B| 


:Iogical  even  PEs  send  DTR  data  logical  ”  +  2^” 


(L6)  PM2^(„-,,  [X«“»0B1 

:logical  even  PEs  send  data  to  logical  odd  PEs 
(L7)  DTK  ^  A  \Xr'QB\ 

ilogical  even  PEs  reload  DTR  from  A  register 

The  proof  of  correctness  of  this  algorithm  is  directly  based  upon  the 
theory  of  partitionability  [SieSO]  and  the  algorithm  for  performing  the  shuffle 
with  PM2I  of  the  same  size.  Performance  evaluation  of  this  algorithm  follows. 
The  general  lower  bound  result  in  [Sie77]  b  applicable  with  r  replaced  by  m, 
yielding  lower  bound  of  r  transfers.  Thus,  thb  algorithm  with  a  performance  of 
r  +  1  transfers  compares  favorably  with  the  lower  bound.  Thb  algorithm  b 
applicable  in  the  following  situations.  Suppose  there  b  an  SIMD  machine  with 
a  PM2I  network,  then  it  b  possible  to  select  a  group  of  PEs  (with  certain 
constraints)  and  let  the  group  perform  a  shuffle  (while  the  other  PEs  are 
dbabled).  Alternatively  it  b  possible  to  “partition”  the  network  into  equal  size 
groups  and  let  any  or  all  of  the  groups  perform  the  shuffle  concurrently,  using 
appropriate  masking.  The  groups  which  will  perform  the  shuffle  will  be 
determined  by  the  value  of  “B”  in  the  algorithm.  Suppose  there  b  an 
multiple-SIMD  machine  with  a  PM2I  network,  then  the  algorithm  can  be  used 
so  that  each  SIMD  submachine  can  emulate  shuffle  independently.  Since  the 
submachines  are  independent,  they  can  be  of  different  sizes. 


10.6  ShnfSing  with  the  Illiee  Network 


In  this  section  the  use  of  the  lUiac  network  to  perform  the  shuffle  will  be 
examined.  A  lower  bound  of  2(n— 2)  transfers  can  be  derived  from  [NaSSO].  In 
(NaS80]  there  is  also  a  procedure  for  constructing  an  algorithm  for  a  mesh- 
connected  computer  to  perform  the  shuffle  in  2n  transfers  (a  mesh  network  is 
the  same  as  an  Illiac  network  without  the  “wrap  around”  edge  links).  In  this 
section,  an  explicit  algorithm  for  the  Illiac  to  perform  the  shuffle  in  2n-l 
transfers  is  given.  It  is  based  upon  the  algorithm  for  the  PM2I  to  perform  the 
shuffle.  Since  Illiac  cannot  be  partitioned  into  independent  subnetworks 
[SieSO],  consideration  of  performing  the  shuffle  on  a  subset  of  PEs  is 
inappropriate. 

In  this  section  an  algorithm  to  perform  the  shuffle  with  the  Illiac  wiU  be 
developed.  Consider  an  algorithm  for  performing  a  size  N  shuffle 
interconnection  function  on  a  size  N  Illiac  network,  where  r  =  ^/N  =  2'"^^  is  an 
integer  (i.e.  m  is  even).  The  three  ground  rules  Ibted  in  Section  10.5  are  also 
used  in  this  section. 

The  algorithm  to  perform  the  shuffle  using  the  Illiac  network  will  be 
constructed  by  replacing  each  PM2I  interconnection  function  in  the  above 
algorithm  with  Illiac  interconnection  functions.  For  L2,  use  “Illiac 
since  Illiac  .t-i  =  PM2.t.o  definition.  Similarly,  for  L6,  use  “Illiac .t.|  pC*""‘0).” 
To  do  L5,  first  recall  that  only  the  even  numbered  PEs  contain  the  data  of 
concern  (after  L2  is  executed  and  before  L6  is  executed).  Therefore,  it  is 
acceptable  to  use  “PM2.t.j  pC"]”  in  L5,  since  any  data  movement  among  the 
odd  numbered  PEs  is  ignored  (and  overwritten  by  L6).  To  perform 


“PM24.j  pC°],”  for  1  <  j  <  m,  with  the  lUiac  network  the  algorithms 
presented  in  (Sie70b]  can  be  used.  Specifically,  to  perform  “PM2.|.j  pC™]”  for 
1  <  j  <  m/2  use: 

for  i  =  I  nnta  2^  do  Illiac.,.1  pC"] 

since  2^  execution  of  Illiac.).!  is  equivalent  to  +2'  =  PM2.|.j.  Analogously,  to 
perform  “PM2+j  pC"J”  for  m/2  <  j  <  m  use: 

for  i  =  1  untU  2^/n  do  niiac.t.„  pC®] 

since  2Vn  executions  of  Illiac.|.Q  is  equivalent  to  +2^  =  PM24.j.  The  total 
number  of  llliac  transfers  needed  is: 
for  L2:  1 
for  L6:  1 

for  L5,  1  <  j  <  m/2:  2  2^  =  2®/*  -  2  =  n-2 

j=l 

for  L5,  m/2  <  j  <  m:  2  2Vn  =  j]  =  S  2'  =  n-1 

j=in/2  j=in/2  j=0 

Thus,  the  grand  total  is  2n-l  transfers.  The  number  of  register  to  register 
moves  is  still  m  +  1. 

In  summary,  an  algorithm  to  perform  the  shuffle  data  permutation  using 
the  llliac  interconnection  network  has  been  constructed  based  on  the  algorithm 
to  perform  the  shuffle  using  the  PM2I  network.  The  algorithm  developed  for 
the  llliac  required  2n-i  inter-PC  transfers. 


241 


I 

10.7  Conclusions 


The  ability  of  the  PM2I  and  Dliac  type  single  stage  SIMD  machine 
interconnection  networks  to  perform  the  shuffle  interconnection  was  examined. 
It  was  previously  shown  that  a  lower  bound  on  the  number  of  transfers  needed 
for  the  PM2I  network  to  perform  the  shuffle  is  log2N.  The  algorithm  described 
here  and  proven  correct  required  only  (logjN)^-!  transfers.  Also,  an  algorithm 
for  the  case  where  there  is  a  machine  with  a  PM2I  network  and  it  is  desired  to 
emulate  a  shuffle  that  is  of  smaller  size  than  the  host  network  was  presented. 
Using  the  PM2I  algorithm  as  a  basis,  an  algorithm  for  the  Illiac  to  emulate  the 
shuffle  is  given.  Its  performance  is  2\/N  —  1  transfers,  which  b  only  three 
transfers  more  than  the  previously  shown  lower  bound  of  2y/N  -  4. 

These  results  are  of  both  theoretical  and  practical  value.  Theoretically, 
they  add  to  the  body  of  knowledge  about  the  properties  of  the  PM2I  and  Illiac 
networks.  Practically,  the  algorithms  presented  could  actually  be  used  to 
perform  the  shuffle  interconnection  on  a  system  that  has  implemented  the 
PM2I  or  Illiac  type  of  interconnection  network. 


243 


The  two  major  methods  used  to  speed  up  the  execution  phase  of  a 
computational  task  are  (a)  utilization  of  new  materials  to  construct  faster 
devices  and  (b)  exploitation  of  parallel  execution  of  subtasks  of  the  task.  This 
research  was  concerned  with  the  second  method  of  speeding  up  the  execution 
phase  of  a  task.  The  exploitation  of  time  parallel  execution  of  subtasks  of  the 
task  requires  parallel  computer  architectures.  In  general  a  parallel  computer 
system  consists  of  a  set  of  devices  such  as  processors,  memories,  and  I/O 
devices  that  communicate  through  one  or  more  interconnection  networks. 

Different  computer  systems  use  their  networks  differently,  as  can  be  seen 
in  the  following  few  examples.  Some  systems  use  networks  dedicated  to  the 
communication  between  particular  subsystems,  some  other  systems  use  a  single 
network  multiplexed  for  communication  among  different  parts  of  the  system. 
In  an  ensemble  parallel  system  the  network  b  used  by  the  control  unit  to 
broadcast  instructions  and  data.  In  a  pipelined  system  the  interconnection 
network  b  used  to  provide  data  communication  among  the  computational  units 
(segments)  of  the  pipeline.  In  vector  and  array  parallel  system  one 
interconnection  network  b  used  for  interprocessor  communication  and  a  usually 
separate  network  b  used  by  the  control  unit  to  broadcast  data,  instructions, 
and  control  information  to  the  processors.  In  a  systolic  system  the 
interconnection  network  b  used  to  propagate  the  wave  of  the  partial  results 
from  a  set  of  processors  to  the  next  set  of  processors.  In  an  associative  system 
the  control  unit  uses  the  interconnection  network  to  broadcast  the  selected 
data  fields  to  the  processors  for  comparison,  and  in  some  cases  another  network 
b  used  for  interprocessor  communications.  Reconfigurable  systems  have  a 
network  that  allows  the  system  to  be  statically  or  dynamically  restructured 
into  multiple  machines  of  different  sizes  in  terms  of  processors.  Data  fiow 


I 


244 


system  consisting  of  multiple  rings  needs  a  communication  network  to  move 
data  among  rings. 

The  computer  system  designer  is  faced  with  two  basic  tasks:  the  analysis 
and  evaluation  of  existing  interconnection  networks  and  the  synthesis  of 
desired  interconnection  networks.  Much  research  has  been  done  on  several 
topologically  regular  interconnection  networks.  Amongst  the  best  known 
networks  are  Illiac  [BoD72],  Shuffle  (LaS76],  Omega  [Law75],  multistage  Cube 
[AdS82b],  STARAN  [Bat76],  ADM  [McS82],  k-connected  mesh  [NaS80],  and 
PM2I  [SeS84b].  The  researcher  usually  proceeded  as  follows:  he  selected  a 
network  of  interest,  devised  a  model  for  that  network  and  derived  analytical 
results  based  on  that  network.  This  approach  has  the  drawback  that  the 
results  are  network  specific  since  the  model  is  network  specific  and  sometimes 
implementation  dependent.  In  addition,  most  work  was  concentrating  on  the 
analysis  of  properties  of  networks  and  only  a  little  has  been  done  on  the 
synthesis  of  networks  with  desired  properties. 

Our  research  differs  from  the  past  work  in  following  aspects.  First,  a 
unified  approach  to  the  analysis  of  interconnection  networks  that  is  valid  for 
large  of  classes  interconnection  networks  was  developed.  The  approach  is 
unified  in  the  sense  that  it  does  not  assume  a  particular  network  or  an 
implementation  but  considers  networks  as  a  set.  Second,  two  algorithms  that 
allow  systematic  design  of  networks  with  the  desired  property  of 
partitionability  were  developed.  In  more  detail,  the  following  related  topics  of 
topological  properties  of  parallel  computer  systems  were  studied. 

In  Chapter  4  a  general,  implementation  independent  model  for  single  stage 
interconnection  networks  was  developed.  The  model  can  be  used  to  analyze 
both  topologically  regular  and  irregular  single  stage  interconnection  networks. 


The  network  model  was  extended  to  the  modeling  of  parallel  computer 
systems.  A  system  informally  consists  of  a  set  of  devices,  an  interconnection 
network,  and  the  method  of  use  of  the  network  by  the  devices.  Three  difiTerent 
types  of  systems  were  defined,  based  upon  the  method  of  use  of  the  network, 
and  several  relationships  between  systems  were  analyzed.  The  systems  types 
recognized  are  recirculating,  nonrecirculating,  and  partially  recirculating.  In  a 
recirculating  system  each  device  d^  has  its  logical  output  port  connected  to  an 
input  label  of  the  interconnection  network  and  its  input  port  connected  to  an 
output  label  of  the  interconnection  network.  One  result  of  this  configuration  is 
that  the  system  can  generate  different  connection  patterns  using  multiple 
passes  through  the  network.  Also,  for  a  recirculating  system  |  Vi|  =  |  Vq|  .  In 
a  nonrecirculating  system,  each  device  is  connected  only  to  network  input  or 
(exclusively)  to  a  network  output.  This  type  of  configuration  appears 
frequently  in  real  time  digital  signal  processing  systems.  The  result  of  this 
configuration  b  that  no  new  connection  patterns  can  be  achieved  by  multiple 
passes,  since  it  is  not  possible  to  move  the  data  from  the  output  of  the  network 
back  to  its  input.  A  partially  recirculating  system  contains  some,  but  not  all, 
devices  each  of  which  has  its  output  port  connected  to  the  network  input  label 
and  its  input  port  connected  to  an  output  label  of  the  network.  If 
|V||  than  the  system  cannot  be  recirculating  and  can  be  only  either 

partially  recirculating  or  nonrecirculating. 

The  previous  method  of  classification  of  the  relationship  between  two 
networks  K*  and  used  only  two  categories:  (a)  K*  and  are  isomorphic 
and  (b)  K'  and  are  not  isomorphic.  Our  method  refined  the  classification  of 
the  relationship  between  two  networks  into  the  following  categories  presented 
in  the  order  of  decreasing  similarity:  (a)  K*  is  isomorphic  to  K^,  (b)  K*  is 


subnetwork  of  type  c  of  K^,  (e)  b  subnetwork  of  type  b  of  (d)  none  of 
the  above.  The  method  was  expanded  to  classify  relationship  between  two 
systems  S*  and  S^.  The  categories  are  similar  except  in  addition  to  above, 
another  category  is  possible  after  “type  b”  and  that  b  “type  a.” 

In  Chapter  5  the  measure  of  similarity  between  two  systems  was  expanded 
to  include  arbitrary  labeling.  Recall  that  in  Chapter  4  the  comparison  between 
system  S*  over  V/  x  Vq'  and  system  S*  over  V|*  x  Vq^  assumed  the  labeling 
was  such  that  V|‘  x  Vq*  C  V|*  x  Vq*.  |If  tbb  condition  does  not  hold  that 
does  not  mean  that  the  two  systems  are  necessarily  dissimilar.  It  could  be  that 
the  labeling  of  of  the  two  systems  b  different.  To  handle  thb  problem,  the 
concept  of  quasimorphism  of  systems  was  developed.  Quasimorphbm  allows 
comparbon  of  randomly  labeled  systems  with  arbitrary  topologies.  The 
problem  of  comparison  of  systems  can  be  formulated  as  finding  relationships 
between  two  S-sets.  The  problem  is  very  complex  and  therefore  was  broken 
down  into  two  major  steps.  First  the  T-set  over  V]  x  Vq  was  defined.  The 
T-set  has  less  constraints  than  the  S-set  over  the  same  Vj  x  Vq  and  therefore 
it  is  easier  to  analyze  relationships  between  T-sets  than  between  S-sets.  The 
quasimorphism,  denoted  by  ip,  is  uniquely  determined  by  two  maps  ^  and  <f>Q. 
Some  behavior  of  0j,  0q  was  inherited  by  V'-correspondence  ip.  For  example  if 
01-map  0],  and  0o-map  0o  are  1:1  maps  then  0-correspondence  0  b  a  1:1 
correspondence.  Conversely  if  a  0-correspondence  ip  is  1:1  then  the  0pmap  0| 
and  the  0o-map  0o  are  1:1  maps.  Properties  of  the  0-correspondence  ip  similar 
to  the  reflexive,  symmetric,  and  transitive  properties  of  relations  were 
dbcussed,  in  particular  the  following  were  shown.  Let  S*,  S^,  and  S^  be  three 
systems.  A  quasimorphbm  has  the  following  properties. 


247 


(1)  B  0  such  that  0  (S*)  = 

(2)  0*  (S‘)  =  S*  3  0*  such  that  ^  (S*)  =  S». 

(3)  0*  (S‘)  =  S*  and  0*  (S*)  =  S»  -♦  3  0,  0  (S‘)  =  S». 

Let  SS  S^,  and  S*  be  three  systems.  A  1:1  quasimorphism  has  the  following 
properties. 

(1)  3  0,  1:1  such  that  0  (S‘)  =  S*. 

(2)  0*  (S‘)  =  S*,  1:1  ^  3  1:1  such  that  0*  (S*)  = 

(3)  0^  (S*)  =  S*,  1:1  and  0*  (S*)  =  S®,  1:1  3  0,  1:1,  0  (S‘) 

=  S®. 

The  quasimorphism  measure  provides  a  theoretical  background  for 
studying  the  following  problems  of  parallel  processing. 

(a)  Emulation  of  system  S*  by  system  S®. 

(b)  Fault  tolerance  method  achieved  by  concurrent  execution  of  multiple 
copies  of  the  same  problem. 

(c)  Partitioning  of  a  system. 

Three  types  of  emulation  were  dehned  based  upon  the  subsystem 
relationship  between  the  image  of  the  emulated  system  and  the  host  system. 
Several  measures  of  efficiency  of  the  emulation  based  upon  the  preservation  of 
the  computational  loading  and  other  factors  were  defined  and  the  emulation 
types  were  evaluated  on  that  basis.  For  example  in  a  system  where  the  input 
nodes  are  connected  to  processors  and  the  output  nodes  to  memories,  the 
factors  have  the  following  physical  meaning.  If  the  input  node  factor  =  1,  then 
the  computational  load  of  each  processor  in  the  emulated  system  is  same  as  the 
computational  load  in  the  image  of  the  emulated  system  in  the  host  system.  If 
the  output  node  factor  =  1,  then  the  amount  of  data  stored  in  each  of 


248 


memories  of  the  emulated  system  is  same  as  the  amount  of  the  data  stored  in 
the  image  of  the  emulated  system  in  the  host  system.  Factors  greater  than  1 
imply  a  heavier  load  in  terms  of  computation  or  amount  of  data  stored  per 
memory  unit  than  in  the  host  system. 

In  Chapter  6  operations  on  single  stage  networks  such  as  composition  and 
decomposition  were  defined.  Using  these  primitives,  the  partitionability  of 
single  stage  networks  was  studied.  Partitionability  informally  means  that  the 
network  can  be  divided  into  several  parts  with  certain  amount  of  independence. 
The  partitionability  property  is  important  for  the  reasons  detailed  in  the 
chapter. 

Three  types  of  partitionability  were  recognized  and  an  algorithm  was 
developed  which  will  output  one  of  the  following; 

(1)  The  network  is  not  partition  able. 

(2)  The  network  is  partitionable  into  subnetworks  with  common  control 
signals  and  the  combination  of  the  of  the  subnetworks  will  exactly 
generate  all  interconnection  patterns  of  the  original  network. 

(3)  The  network  is  partitionable  into  subnetworks  with  separate  control 
signals  and  the  combination  of  the  subnetworks  will  exactly  generate  all 
interconnection  patterns  of  the  original  network. 

(4)  The  network  is  partitionable  into  subnetworks  with  separate  control 
signals  and  the  combination  of  the  subnetworks  will  generate  a  superset 
of  interconnection  patterns  of  the  original  network. 

The  algorithm  is  network  topology  independent  and  can  be  used  to 
analyze  topologically  regular  and  irregular  interconnection  networks. 


In  Chapter  7  two  techniques  of  synthesis  of  single  stage  partitionable 
networks  were  developed.  Each  of  these  techniques  allow  the  design  of  a  large 
class  of  partitionable  single  stage  interconnection  networks.  The  speciScation 
of  the  construction  is  given  in  terms  of  properties  of  the  individual  I/O 
correspondences. 

In  Chapter  8  multistage  networks  were  studied.  A  composition  of  single 
stage  networks  was  defined  and  its  properties  studied.  Using  the  model  of 
single  stage  network  and  composition  above,  the  multistage  network  was 
defined.  The  model  is  very  general  since  each  stage  of  the  multistage  network 
is  topologically  general  single  stage  network.  Several  examples  of  the 
application  of  the  model  were  presented. 

In  Chapter  9  a  network  and  network  interfaces  were  designed  for  a  real¬ 
time,  distributed  digital  signal  processing  system.  The  design  was  subject  to 
number  of  system  constraints  such  as  very  high  throughput,  system 
extendibility,  and  fault  tolerance  requirements.  For  this  application,  and  given 
the  current  and  near  future  technology,  a  crossbar  based  interconnection 
network  was  very  well-suited  to  the  task  under  consideration.  Two  different 
fault  tolerant  chip  architectures  were  presented.  Four  network  architectures 
were  designed  and  their  characteristics  described.  Several  fault  detection  and 
recovery  techniques  on  the  system  level  were  shown,  since  the  fault  tolerance  is 
a  salient  issue  of  this  system. 

In  Chapter  10  the  ability  of  the  PM2I  and  Illiac  type  single  stage  SIMD 
machine  interconnection  networks  to  perform  the  shuffle  interconnection  was 
examined.  It  was  previously  shown  that  a  lower  bound  on  the  number  of 
transfers  needed  for  the  PM2I  network  to  perform  the  shuffle  is  log2N.  The 
algorithm  described  here  and  proven  correct  required  only  (log2N)  + 1  transfers. 


I 


Also,  an  algorithm  for  the  case  where  there  is  a  machine  with  a  PM21  network 
and  it  is  desired  to  emulate  a  shuffle  that  is  of  smaller  size  than  the  host 
network  was  presented.  Using  the  PM2I  algorithm  as  a  basis,  an  algorithm  for 
the  Illiac  to  emulate  the  shuffle  was  developed.  Its  performance  is  2v/N  -  1 
transfers,  which  is  only  three  transfers  more  than  the  previously  shown  lower 
bound  of  2v^  -  4. 


LIST  OF  REFERENCES 


[AdS82a] 

(AdS82b] 

[BaB68] 

(Bae80] 

[Bat74l 

[Bat76] 

[Bat77| 

[BatSO] 

(B6K70] 


G.  B.  Adams  III  and  H.  J.  Siegel,  “On  the  number  of  permutations 
performable  by  the  augmented  data  manipulator  network,”  IEEE 
Tram.  CompuL,  Vol.  C-31,  Apr.  1982,  pp.  270-277. 

G.  B.  Adams  ID  and  H.  J.  Siegel,  “The  extra  stage  cube:  a  fault- 
tolerant  interconnection  network  for  supersystems,”  IEEE  Trans. 
Comput.,  Vol.  C-31,  May  1982,  pp.  443-454. 

G.  H.  Barnes,  R.  M.  Brown,  M.  Kato,  D.  J.  Kuck,  D.  L.  Slotnick, 
and  R.  A.  Stokes,  “The  Illiac  IV  computer,”  IEEE  Trans. 
Comput.,  Vol.  C-17,  Aug.  1968,  pp.  746-757. 

J.  L.  Baer,  Computer  Systems  Arehiteeture,  Computer  Science 
Press,  Potomac,  MD,  1980. 

K.  E.  Batcher,  “STARAN  Parallel  Processor  System  Hardware,” 
AFIPS  Conf.  Proe.,  1974  Natl.  Computer  Conf.,  May  1974,  pp. 
405-410. 

K.  E.  Batcher,  “The  flip  network  in  STARAN,”  1976  Inti.  Conf. 
Parallel  Processing,  Aug.  1976,  pp.  65-71. 

K.  E.  Batcher,  “STARAN  series  E,”  1977  Inti.  Conf  Parallel 
Processing,  Aug.  1977,  pp.  140-143. 

K.  E.  Batcher,  “Design  of  a  Massively  Parallel  Processor,”  IEEE 
Trans.  Comput.,  Vol.  C-29,  Sept.  1980,  pp.  836-840. 

J.  L.  Bentley  and  H.  T.  Kung,  “A  Tree  Machine  for  Searching 
Problems,”  1979  Inti.  Conf  Parallel  Processing,  Aug.  1979,  pp. 
257-266. 


252 


[Ben65]  V.  E.  Benes,  MathematietJ  Theory  of  ConneeUng  Networks  and 
Telephone  Traffic,  Academic  Press,  New  York,  NY,  1065. 

[Ben74]  V.  E.  Benes,  “Applications  of  group  theory  to  connecting 
networks,”  The  Bell  System  Technical  Journal,  Vol.  54,  Feb.  1975, 
pp.  407-420. 

[Bo072]  W.  J.  Bouknight,  S.  A.  Denenberg,  D.  E.  McIntyre,  J.  M.  Randall, 
A.  H.  Sameh,  and  D.  L.  Slotnick,  “The  lUiac  IV  System,”  Proc. 
IEEE,  Vol.  60,  Apr.  1972,  pp.  369-388. 

[BoM76]  J.  A.  Bondy  and  U.  S.  R.  Murty,  Graph  Theory  with  Applications, 
North  Holland  Publishing,  1976. 

[BuK71]  P.  Budnick  and  D.  J.  Kuck,  “The  organization  and  use  of  parallel 
memories,”  IEEE  Trans.  Comput.,  Vol.  C-20,  Dec.  1971,  pp.  1566- 
60. 

[ChL81]  P-Y.  Chen,  D.  H.  Lawrie,  P-C.  Yew,  and  D.  A.  Padua, 
“Interconnection  networks  using  shuffles,”  Computer,  Vol.  14,  Dec. 
1981,  pp.  55-64. 

[CoG74]  G.  R.  Couranz,  M.  S.  Gerhardt,  and  C.  J.  Young,  “Programmable 
RADAR  signal  processing  using  the  RAP,”  1974  Sagamore 
Computer  Conf.  PanUlel  Processing,  Aug.  1074,  pp.  37-52. 

pem83]  G.  L.  DeMuth,  “A  distributed  signal  processor  incorporating  VLSI 
and  high  order  language  programming,”  198S  Int’l.  Conf. 
Acoustics,  Speech,  and  Signal  Processing,  Apr.  1083,  pp.  439-442. 

[FeF74]  J.  D.  Feldman  and  L.  C.  Fulmer,  “RADCAP  -  An  operational 
parallel  processing  facility,”  1974  Natl.  Computer  Conf,  May 
1074,  pp.  7-15. 

[Fen74]  T.  Feng,  “Data  manipulating  functions  in  parallel  processors  and 
their  implementations,”  IEEE  Trans.  Comput,  Vol.  C-23,  Mar. 
1974,  pp.  300-318. 


[FenSl]  T.  Feng,  “A  survey  of  interconnection  networks,”  Computer,  Vol. 
14,  Dee.  1081,  pp.  12-27. 

[FiF82]  J.  P.  Ftshburn  and  R.  A.  Finkel,  “Quotient  networks,”  IEEE 
Trane.  Comput.,  Vol.  C-31,  Apr.  1082,  pp.  288-205. 

(FlyOdJ  M.  J.  Flynn,  “Very  high-speed  computing  systems,”  Proc.  of  the 
IEEE,  Vol.  54,  Dec.  1066,  pp.  1001-1000. 

[FrW811  M.  A.  Franklin  and  D.  F.  Wann,  “Pin  limitations  and  VLSI 
interconnection  networks,”  1981  InVl.  Conf.  Parallel  Processing, 
Aug.  1081,  pp.  253-258. 

[FrW82]  M.  A.  Franklin,  D.  F.  Wann,  and  W.  J.  Thomas,  “Pin  limitations 
and  partitioning  of  VLSI  interconnection  networks,”  IEEE  Trans. 
Computers,  Vol.  C-31,  Nov.  1082,  pp.  1100-1116. 

[Gok76]  L.  R.  Goke,  “Banyan  networks  for  partitioning  multiprocessors 
systems,”  Ph.D.  Thesis,  University  of  Florida,  June  1076. 

[GoL73]  L.  R.  Goke  and  G.  J.  Lipovski,  “Banyan  networks  for  partitioning 
multiprocessing  systems,”  1st  Annu<d  Symp.  Computer 
Architecture,  Dec.  1073,  pp.  21-28. 

(HaL82]  L.  S.  Haynes,  R.  L.  Lau,  Siewiorek,  and  D.  W.  Mizell,  “A  survey 
of  highly  parallel  computing,”  Computer,  Vol.  15,  Jan.  1082,  pp. 
0-24. 

[Han68]  C.  B.  Hanneken,  Introduction  to  Abstract  Algebra,  Dickenson 
Publishing,  1068. 

|Har60]  F.  Harary,  Graph  Theory,  Addison  Wesley  Publbher,  1060. 

[Her75]  I.  N.  Herstein,  Topics  in  Algebra,  Xerox  College  Publishing,  1075. 

(Hig72]  L.  C.  Higbie,  “The  Omen  computer:  associative  array  processor,” 
IEEE  Computer  Society  Compcon  12,  Sept.  1072,  pp.  287-200. 


254 


[HunSl] 

[KaP80] 

[K0T8O] 

[KttL78] 

[Kun82| 

(Lan76] 

(LaS76l 

[Law75] 

(LiM82] 

(MaM81al 


D.  J.  Hunt,  'The  ICL  DAP  and  its  application  to  image 
processing,”  in  Lanptagt$  and  Arekiteeturei  for  Image  Proeeeeing, 
M.  J.  B.  Duff  and  S.  Levialdi,  eds.,  Academic  Press,  London, 
England,  1081,  pp.  275*282. 

R.  N.  Kapur,  U.  V.  Premkumar,  and  G.  J.  Lipovski, 
“Organization  of  the  TRAC  processor-memory  subsystem,”  1980 
NaVL  Computer  Conf.,  May  1980,  pp.  623-620. 

E.  W.  Kozdrowicki  and  D.  J.  Theis,  “Second  generation  of  vector 
supercomputers,”  Computer,  Vol.  13,  Nov.  1080,  pp.  71-83. 

H.  T.  Kung  and.  C.  E.  Leiserson,  ‘^Systolic  arrays  (for  VLSI),” 
Proe.  Symp.  Spare  e  Matrix  Computations  and  Their  Appiieationa, 
Nov.  1078,  pp.  256-282. 

H.  T.  Kung,  “Why  Systolic  Architectures?,”  Computer,  Vol.  15, 
Jan.  1082,  pp.  37-46. 

T.  Lang,  “Interconnections  between  processors  and  memory 
modules  using  the  shuffle-exchange  network,”  IEEE  Trane. 
Comput.,  Vol.  C-25,  May  1076,  pp.  406-503. 

T.  Lang  and  H.  S.  Stone,  “A  shuffle-exchange  network  with 
simplified  control,”  IEEE  Trane.  Comput.,  Vol.  C-25,  Jan.  1076, 
pp.  55-66. 

D.  H.  Lawrie,  “Access  and  alignment  of  data  in  an  array 
processor,”  IEEE  Trane.  Comput.,  Vol.  C-24,  Dec.  1075,  pp.  114^ 
1155. 

G.  J.  Lipovski  and  M.  Malek,  “A  Theory  for  Interconnection 
Networks,”  Electrical  Engineering  Department,  University  of  Texas 
at  Austin,  TRAC  Report  41,  Oct.  1082. 

B.  A.  Makrucki  and  T.  N.  Mudge,  “VLSI  design  of  a  crossbar 
switch,”  Teehnied  Report  SEL-TR-I49,  Dept,  of  Electrical  and 
Computer  Engineering,  The  University  of  Michigan,  Ann  Arbor, 
MI,  Jan.  1081. 


[MaMSlb]  M.  Malek  and  W.  W.  Myre,  “A  description  method  of 
interconnection  network,”  Diatribvttd  Proce$$ing,  Vol.  1,  No.  1, 
Feb.  1981,  pp.  1-6. 

(McS82]  R.  J.  McMillen  and  H.  J.  Siegel,  “Routing  schemes  for  the 
augmented  data  manipulator  network  in  an  Ml^D  system,”  IEEE 
Tratu.  CompuL,  Vol.  C-31,  Dec.  1982,  pp.  1202-1214. 

[McT82)  S.  McFarling,  J.  Turney,  and  T.  N.  Mudge,  “VLSI  design  version 
two,”  Technical  Report  CRL-TR-8-82,  Computing  Research 
Laboratory,  The  University  of  Michigan,  Ann  Arbor,  MI,  Feb. 
1982. 

[NaS80]  D.  Nassimi  and  S.  Sahni,  “An  optimal  routing  algorithm  for 
mesh-connected  parallel  computers,”  J.  Asa.  Comput.  Mach.,  Vol. 
27,  Jan.  1980,  pp.  6-29. 

[NaS81]  D.  Nassimi  and  S.  Sahni,  “Data  broadcasting  in  SIMD  computers,” 
IEEE  Trans.  Comput.,  Vol.  C-30,  Feb.  1981,  pp.  101-107. 

[NaS82]  D.  Nassimi  and  S.  Sahni,  “Parallel  permutation  and  sorting 
algorithms  and  a  new  generalized  connection  network,”  Journal  of 
the  ACM,  Vol.  29,  July  1982,  pp.  642-667. 

[Nut77]  G.  J.  Nutt,  “Microprocessor  implementation  of  a  parallel 
processor,”  J^th  Symp.  Computer  Architecture,  Mar.  1977,  pp.  147- 
152. 

[OkT82l  Y.  Okada,  H.  Tajima,  and  R.  Mori,  “A  reconfigurable  parallel 
processor  with  microprogram  control,”  IEEE  Micro,  Vol.  2,  Nov. 
1982,  pp.  48-60. 

[Orc76]  S.  E.  Orcutt,  “Implementation  of  permutation  functions  in  Illiac 
IV-type  computers,”  IEEE  Trane.  Comput.,  Vol.  C-25,  Sept.  1976, 
pp.  929-936. 

[P«R82]  D.  S.  Parker  and  C.  S.  Raghavendra,  "The  gamma  network:  a 
multiprocessor  interconnection  network  with  redundant  paths,”  9th 
Annual  Symp.  Computer  Architecture,  Apr.  1982,  pp.  73-80. 


[Pe*77]  M.  C.  Pease,  “The  indirect  binary  n>cube  microprocessor  array,” 
IEEE  Trans.  Comput.,  Vol.  C-2S,  May  1077,  pp.  458*473. 

[PrKSO]  D.  K.  Pradhan  and  K.  L.  Kodandapani,  “A  uniform  representation 
of  single*  and  multistage  interconnection  networks  used  in  SIMD 
machines,”  IEEE  Trans.  CompuL,  Vol.  C-20,  Sept.  1080,  pp.  777- 
701. 

[RaF83]  L.  V.  Ramakrishnan,  D.  S.  Fussell,  and  A.  Silberschatz,  ”On 
mapping  homogeneous  graphs  on  a  linear  array-processor  model,” 
198S  Inl'l.  Conf.  Parallet  Processing,  Aug.  1083,  pp.  440*447. 

(ReS82]  A.  P.  Reeves  and  R.  R.  Seban,  “The  moment  computer,”  15th 
Hawaii  Ini  Y.  Conf.  System  Sciences,  Jan.  1082,  pp.  388-306. 

[RoP77]  D.  Rohrbacher  and  J.  L.  Potter,  “Image  processing  with  the 
STARAN  parallel  computer,”  Computer,  Vol.  10,  June  1077,  pp. 
54-50. 

[Seb82]  R.  R.  Seban,  “Parallel  Computer  Architecture  Parallel  Algorithms 
and  the  Theory  of  Image  Analysis  Using  Cartesian  Moments,” 
Master’s  Thesis,  Purdue  University,  Aug.  1082. 

[SeS84a]  R.  R.  Seban  and  H.  J.  Siegel,  “Theoretical  modeling  and  analysis 
of  special  purpose  interconnection  networks,”  4^h  Int’l.  Conf. 
Distributed  Computing  Systems,  May  1084,  pp.  256-265. 

[SeS84b]  R.  R.  Seban  and  H.  J.  Siegel,  “Shuffling  with  Illiac  and  PM2I 
SIMD  networks,”  IEEE  Trans.  Comput.,  Vol.  C-33,  July  1984,  pp. 
610-625. 

(SeS84c]  R.  R.  Seban,  H.  J.  Siegel,  and  D.  G.  Meyer,  “Data  communications 
in  a  Real-Time  distributed  signal  processing  system:  A  case 
study,”  1984  Real-Time  Systems  Symp.,  Dec.  1984. 

[SeS85]  R.  R.  Seban  and  H.  J.  Sjegel,  “Analysis  of  partitionability 
properties  of  topologically  arbitrary  interconnection  networks,”  5th 
Int'l.  Conf.  Distributed  Computing  Systems,  May  1085,  pp.  173-181. 


M.  C.  Sejnowski,  E.  T.  Upchurch,  R.  N.  Kapur,  D.  P.  S.  Charlu, 
and  G.  J.  Lipovski,  “An  overview  of  the  Texas  Reconfigurable 
Array  Computer,”  1980  Nat’l.  Computer  Conf.,  May  1980,  pp. 
631-641. 

H.  J.  Siegel,  “Analysb  techniques  for  SIMD  machine 
interconnection  networks  and  the  effects  of  processor  address 
masks,”  IEEE  Trans.  Comput.,  Vol.  C-26,  Feb.  1977,  pp.  153-161. 

H.  J.  Siegel,  “Interconnection  networks  for  SIMD  machines,” 
Computer,  Vol.  12,  June  1979,  pp.  57-65. 

H.  J.  Siegel,  “A  model  of  SIMD  machines  and  a  comparison  of 
various  interconnection  networks,”  IEEE  Trans.  Comput.,  Vol.  C- 
28,  Dec.  1979,  pp.  907-917. 

H.  J.  Siegel,  “The  theory  underlying  the  partitioning  of 
permutation  networks,”  IEEE  Trans.  Comput.,  Vol.  C-29,  Sept. 
1980,  pp.  791-801. 

H.  J.  Siegel,  Interconnection  Networks  for  Large  Scale  Parallel 
Processing:  Theory  and  Case  Studies,  Lexington  Books,  Lexington, 
MA,  1985. 

H.  J.  Siegel  and  R.  J.  McMillen,  “The  multistage  cube:  A  versatile 
interconnection  network,”  Computer,  Vol.  14,  Dec.  1981,  pp.  65-76. 

H.  J.  Siegel,  D.  G.  Meyer,  E.  J.  Coyle,  R.  R.  Seban,  S.  Hutchinson 
and  B.  Young,  “Interconnection  networks  for  a  distributed  signal 
processing  system,”  Report  for  IBM  Federal  Systems  Division, 
Manassas,  VA,  Dec.  1983. 

H.  J.  Siegel,  L.  J.  Siegel,  R.  J.  McMillen,  P.  T.  Mueller,  Jr.,  S.  D. 
Smith,  “An  SIMD/MIMD  multimicroprocessor  system  for  image 
processing  and  pattern  recognition,”  1979  IEEE  Comp.  Soe.  Conf. 
Pattern  Recognition  and  Image  Processing,  Aug.  1979,  pp.  214-224. 

H.  J.  Siegel,  L.  J.  Siegel,  F.  C.  Kemmerer,  P.  T.  Mueller,  Jr.,  H.  E. 
Smalley,  and  S.  D.  Smith,  “PASM:  a  partitionable  SD^/MIMD 
system  for  image  processing  and  pattern  recognition,”  IEEE  Trans. 
Comput.,  Vol.  C-30,  Dec.  1981,  pp.  934-947. 


158 


[SiS84]  H.  J.  Siegel,  T.  Schwederski,  N.  J.  Davis  IV,  and  J.  T  Kuehn, 
“PASM:  A  reconfigurable  parallel  system  for  image  processing,” 
Computer  Architecture  News,  Vol.  12,  No.  4,  Sep.  1084,  pp.  7*10. 

[Sny82]  L.  Snyder,  “Introduction  to  the  Configurable  Highly  Parallel 

Computer,”  Computer,  Vol.  15,  Jan.  1082,  pp.  47-56. 

[Sto71]  H.  S.  Stone,  “Parallel  processing  with  the  perfect  shuflSe,”  IEEE 
Trane.  Comput.,  Vol.  020,  Feb.  1071,  pp.  153-161. 

[StoSO]  H.  S.  Stone,  Introduction  to  Computer  Architecture,  Science 

Research  Associates,  Chicago,  IL,  1080. 

[ThC8d]  J.  E.  Thornton  and  G.  S.  Christensen,  “Hyperchannel  network 
links,”  Computer,  Vol.  16,  Sept.  1083,  pp.  50-54. 

(The74]  D.  J.  Theis,  “Vector  supercomputers,”  Computer,  Vol.  7,  April 
1074,  pp.  52-61. 

[ThN81]  S.  Thanawastien  and  V.  P.  Nebon,  “Interference  analysb  of 

shuffle/exchange  networks,”  IEEE  Trane.  Computere,  Vol.  030, 
Aug.  1081,  pp.  545-556. 

[ThW75]  K.  J.  Thurber  and  L.  D.  Wald,  “Associative  and  Parallel 

Processors,”  Computing  Surveye,  Vol.  7,  Dec.  1075,  pp.  215-255. 

[TuS83]  D.  L.  Tuomenoksa  and  H.  J.  Siegel,  “Analysb  of  multiple-queue 
task  scheduling  algorithms  for  multiple-SlMD  machines,”  Srd  Int’l. 
Conf.  Dietributed  Competing  Syeteme,  Oct.  1082,  pp.  114-121. 

[Upp81]  P.  V.  Uppaluru,  “A  theoretical  baeie  for  analyeie  and  partitioning 
of  regular  SW  banyane,”  Ph.D.  Thesb,  The  University  of  Texas  at 
Austin,  1081. 

[ViC78]  C.  R.  Vick  and  J.  A.  Cornell,  "PEPE  architecture  -  Present  and 
future,”  1978  Nat '1.  Computer  Conf.,  June  1078,  pp.  081-002. 


[WaF83]  D.  F.  Wann  and  M.  A.  Franklin,  “Asynchronous  and  clocked 
control  structures  for  VLSI  based  interconnection  networks,”  IEEE 
Trane.  Computere,  Vol.  C-32,  Mar.  1083,  pp.  284-203. 


[WaG82]  I.  Watson  and  J.  Gurd,  ‘*A  practical  data  flow  computer,” 
Computer,  Vol.  15,  Feb.  1982,  pp.  51-57. 


[WuF80]  C.  Wu  and  T.  Feng,  “On  a  class  of  multistage  interconnection 
networks,”  IEEE  Trane.  Comput.,  Vol.  C-29,  Aug  1080,  pp.  604- 
702. 

[WuFSl]  C.  Wu  and  T.  Feng,  “The  universality  of  the  shuffle-exchange 
network,”  IEEE  Trane.  Comput.,  Vol.  C-30,  May  1981,  pp.  324- 
332. 

(WuF84]  C.  Wu  and  T.  Feng,  Tutorial:  Interconnection  Networke  for 
Parallel  and  Dietributed  Proceeeing,  IEEE,  Computer  Society  Press, 
Silver  Spring,  MD,  1984. 

[YaF77)  S.  S.  Yau  and  H.  S.  Fung,  “Associative  processor  architecture  -  A 
survey,”  Computing  Surveye,  Vol.  9,  March  1977,  pp.  3-27. 


Robert  R.  Seban  received  the  B.E.  degree  in  electrical  engineering  with 
Magna  Cum  Laude  from  the  City  College  of  New  York  in  1076.  From  1076  to 
1080  he  worked  in  industry  for  Sperry  Univac,  IBM,  and  Advanced  Technology 
Systems.  In  industry  he  held  the  positions  of  logic  and  system  designer  and 
project  supervisor.  In  1080  he  returned  to  graduate  school  at  Purdue 
University.  He  received  M.S.  in  1082  and  Ph.D.  in  1085,  both  in  electrical 
engineering  from  Purdue  University.  While  at  Purdue  he  was  a  recipient  of 
David  Ross  fellowship. 

Dr.  Seban  has  written  one  journal  publication,  five  conference  papers,  and 
several  technical  reports.  His  current  research  interests  include  computer 
architectures  for  parallel  processing  and  interconnection  networks. 

He  b  a  member  of  Eta  Kappa  Nu  Honorary  Society,  Sigma  Xi  Scientific 
Research  Society,  and  IEEE  Computer  Society.  Hb  hobbies  include  swimming, 
fiying,  and  racquetball. 


